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I. Background c 

In speech communication applications, such, as teleconferencing, hands-free telephony and hearing aids, 
the presence of background noise and/or reverberation may significantly reduce the intelligibility of the de- 
sired speech signal. This stems from the large distance between the speaker and the microphone(s). Hence, 
the use of a noise reduction algorithm is necessary. Multi-microphone systems exploit spatial informa- 
tion in addition to temporal and spectral information of the desired signal and noise signal and are thus 
preferred to single microphone procedures (such as spectral subtraction). Because of aesthetical reasons, 
multi-microphone techniques for e.g., hearing aid applications go together with the use of small-sized ar- 
rays. Considerable noise reduction can be achieved with such arrays, but at the expense of an increased 
sensitivity to errors in the assumed signal model such as microphone mismatch, reverberation, ... [1, 2] In 
hearing aids, microphones are rarely matched in gain and phase. In [3], e.g., gain and phase differences 
between microphone characteristics of up to 6 dB and 10°, respectively, have been reported. 

A widely studied multi-channel adaptive noise reduction algorithm is the Generalized Sidelobe Can- 
celler (GSC) [2]-[ll], depicted in Figure 1. The GSC consists of a fixed, spatial pre-processor, which 
includes a fixed beamformer and a blocking matrix, and an adaptive stage based on an Adaptive Noise Can- 
celler (ANC) [12]. The ANC minimizes the output noise power while the blocking matrix should avoid 
speech leakage into the noise references. The standard GSC assumes the desired speaker location, the mi- 
crophone characteristics and positions to be known, and reflections of the speech signal to be absent. If these 
assumptions are fulfilled, it provides ah undistorted enhanced speech signal with minimum residual noise. 
However, in reality these assumptions are often violated, resulting in so-called speech leakage and hence 
speech distortion. To limit speech distortion, the ANC is adapted during periods of noise only [7, 10, 13]. 
When used in combination with small-sized arrays, e.g., in hearing aid applications, an additional robustness 
constraint [9, 10, 14, 15] is required to guarantee performance in the presence of small errors in the assumed 
signal model, such as microphone mismatch [16, 17]. A widely applied method consists of imposing a 
Quadratic Inequality Constraint to the ANC (QIC-GSC) [10, 14, 15, 18, 19]. For LMS updating, the Scaled 
Projection Algorithm (SPA) [14] is a simple and effective technique that imposes this constraint However, 
the QIC-GSC goes at the expense of less noise reduction [17]. 

In [20], a Multi-channel Wiener Filtering (MWF) technique has been proposed that provides a Minimum 
Mean Square Error (MMSE) estimate of the desired signal portion in one of the received microphone signals 
[21]-[24]. In contrast to the ANC of the GSC, the MWF is able to take speech distortion into account in 
its optimization criterion. The MMSE optimization criterion of the MWF can also be generalized to allow 
for a trade-off between speech distortion and noise reduction. We will refer to this generalization as Speech 
Distortion Weighted MWF (SDW-MWF). The MWF technique is uniquely based on estimates of the second 
order statistics of the recorded speech signal and the noise signal. A robust speech detection is thus (again) 
needed. In contrast to the GSC, the MWF does not make any a priori assumptions about the signal model so 
that no or a less severe robustness constraint is needed to guarantee performance when used in combination 
with small-sized arrays [16, 17]. Especially in complicated noise scenarios such as multiple noise sources 



or difluse noise, the MWF outperforms the GSC, even when the GSC is supplemented with a robustness 
constraint [17]. 

In [20, 21], the implementation of the MWF is based on a Generalized Singular Value Decomposition 
(GSVD) of an input data matrix and a noise data matrix. A cheaper alternative based on a QR Decompo- 
sition (QRD)has been proposed in [22]. A subband implementation [23] results in improved intelligibility 
at a significantly lower cost compared to the fullband approach. However, in contrast to the GSC and the 
QIC-GSC [14], no efficient, cheap stochastic gradient based implementation of the (SDW-)MWF, which 
avoids the use of expensive matrix computations, is available yet In [25], an LMS based algorithm for the 
MWF has been developed. The algorithm needs recordings of calibration signals. Since room acoustics, 
microphone characteristics and the location of the desired speaker change over time, frequent re-calibration 
is required, making this approach cumbersome and expensive. In [26], an LMS based SDW-MWF has 
been proposed that avoids the need for calibration signals. The algorithm however relies on some indepen- 
dence assumptions that are not necessarily satisfied, resulting in degraded performance w.r.t matrix-based 
implementations. 

II. Summary 

In the present invention, we establish a generalized multi-channel noise reduction scheme, referred to 
as Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener Filter (SP-SDW-MWF), that 
encompasses the GSC and the MWF as extreme cases. In addition, the scheme allows for in-between 
solutions such as the Speech Distortion Regularized GSC (SDR-GSC). The generalized scheme, depicted 
in Figure 3, consists of a fixed, spatial pre-processor and an adaptive stage that is based on an SDW-MWF, 
hence the name Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener filter (SP-SDW- 
MWF). 

The SP-SDW-MWF adds robustness against signal model errors to the GSC by taking speech distortion 
explicitly into account in the design criterion of the adaptive stage. The SP-SDW-MWF is an alternative 
technique to the widely studied QIC-GSC to decrease the sensitivity of the GSC to signal model errors 
such as microphone mismatch, reverberation, ... A parameter /i. is incorporated in the SP-SDW-MWF that 
allows for a trade-off between speech distortion and noise reduction. Focussing all attention towards speech 
distortion (i.e., setting fi = 0) results in the output of the fixed beamfdrmer. In noise scenarios with very 
low Signal-to-Noise Ratio (SNR), e.g., —10 dB, a fixed beamformer may be preferred. Adaptivity can then 
be easily reduced or excluded in the SP-SDW-MWF by decreasing the parameter to 0. Compared to the 
widely studied QIC-GSC, the SP-SDW-MWF achieves a better noise reduction performance for a given 
maximum allowable speech distortion level. 

In [22, 27] recursive implementations of the (SDW-)MWF have been proposed based on a GSVD or QR 
decomposition. A subband implementation [28] results in improved intelligibility at a significantly lower 
cost compared to the fullband approach. These techniques can be extended to implement the SP-SDW- 
MWF [29]. However, in contrast to the GSC and the QIC-GSC [14], no cheap stochastic gradient based 
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implementation of the SP-SDW-MWF is available. In the present invention, we propose time-domain and 
frequency-domain stochastic gradient implementations of the SP-SDW-MWF that preserve the benefit of 
the matrix-based SP-SDW-MWF over QIC-GSC. 

Below, the different embodiments of the present invention are described 

A first embodiment proposes a Speech Distortion Regularized GSC (SDR-GSC). A new design criterion 
is developed for the adaptive stage of the GSC: the ANC design criterion is supplemented with a regular- 
ization term that limits speech distortion due to signal model errors. In the SDR-GSC, a parameter is 
incorporated that allows for a trade-off between speech distortion and noise reduction. Focussing all atten- 
tion to noise reduction, results in the standard GSC, while, on the other hand, focussing all attention towards 
speech distortion results in the output of the fixed beamformer. In noise scenarios with low SNR, adaptivity 
in the SDR-GSC can be easily reduced or excluded by increasing attention towards speech distortion, i.e., by 
decreasing the parameter fi to 0. The SDR-GSC is an alternative technique to the QIC-GSC to decrease the 
sensitivity of the GSC to signal model errors such as microphone mismatch, reverberation, . ... In contrast to 
the QIC-GSC, the SDR-GSC shifts emphasis towards speech distortion when the amount of speech leakage 
grows. In the absence of signal model errors, the performance of the GSC is preserved. As a result, a better 
noise reduction performance is obtained for small model errors, while guaranteeing robustness against large 
model errors. 

In a second embodiment, we further improve the noise reduction performance of the SDR-GSC by 
adding an extra adaptive filtering operation wo on the speech reference signal. We refer to this general- 
ized scheme as Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener Filter (SP-SDW- 
MWF). The SP-SDW-MWF is depicted in Figure 3 and encompasses the MWF [20] as a special case. Again, 
a parameter \i is incorporated in the design criterion to allow for a trade-off between speech distortion and 
noise reduction. Focussing all attention to speech distortion, results in the output of the fixed beamformer. 
Also here, adaptivity can be easily reduced or excluded by decreasing /* to 0. It is shown that -in the ab- 
sence of speech leakage and for infinitely long filter lengths- the SP-SDW-MWF corresponds to a cascade 
of a SDR-GSC with a SDW single channel Wiener postfilter (SDW-SWF) [30] and thus outperforms the 
SDR-GSC. In the presence of speech leakage, the SP-SDW-MWF with w 0 tries to preserve its performance: 
compared to a SDR-GSC (with SDW-SWF postfilter), the SP-SDW-MWF then contains extra filtering op- 
erations that compensate for the performance degradation of the SDR-GSC (with SDW-SWF) due to speech 
leakage (see also Figure 4). In contrast to the SDR-GSC (and thus also the GSC), performance does not 
degrade due to microphone mismatch. In [22, 27] recursive implementations of the (SDW-)MWF have been 
proposed based on a GSVD or QR decomposition. A subband implementation [28] results in improved 
intelligibility at a significantly lower cost compared to the fullband approach. These techniques can be 
extended to implement the SDR-GSC and, more generally, the SP-SDW-MWF. 

In a third embodiment, we propose cheap time-domain and frequency-domain stochastic gradient im- 
plementations of the SDR-GSC and SP-SDW-MWF. Starting from the design criterion of the SDR-GSC, 
or more generally, the SP-SDW-MWF, we derive a time-domain stochastic gradient algorithm. In addition, 



we modify the LMS based algorithm [26] so that it applies to the SP-SDW-MWF. To increase convergence 
and reduce complexity, a frequency-domain implementation has been proposed. Both, the stochastic gra- 
dient and LMS based algorithm suffer from a large excess error when applied in highly time-varying noise 
scenarios. We show that the excess error in the stochastic gradient algorithm is reduced by applying a low 
pass filter to the part of the gradient estimate that limits speech distortion. The low pass filtering avoids 
a highly time-varying distortion of the desired speech component while not degrading the tracking perfor- 
mance needed in time-varying noise scenarios. The stochastic gradient SP-SDW-MWF outperforms the 
LMS based algorithm, while complexity is not increased. Experimental results show that the low pass filter- 
ing significantly improves the performance of the stochastic gradient algorithm and does not compromise the 
tracking of changes in the noise scenario. In addition, experiments demonstrate that the proposed stochastic 
gradient algorithm preserves the benefit of the SP-SDW-MWF over QIC-GSC. The limited computational 
cost and the better noise reduction performance of the proposed algorithm make it a good alternative to the 
SPA [14] for implementation in hearing aids. 



Brief Description of the Drawings 

A number of embodiments of the present invention, together with some aspects of 
the prior art will now be described with reference to the drawings, in which: 

Fig. 1 depicts the concept of a Generalized Sidelobe Canceller; 

Fig. 2 depicts an equivalent approach of multi-channel Wiener filtering; 
. Fig. 3 depicts a Spatially Pre-processed SDW MWF; 

Fig. 4 depicts the decomposition of SP-SDW-MWF with wo in a multi-channel filter 
wa and single-channel postfilter ei - w 0 ; 

Fig. 5 shows the influence of 1/jx on the performance of the SDR GSC for different 
gain mismatches Y 2 at the second microphone; 

Fig. 6 shows the influence of 1/jx on the performance of the SP SDW MWF with Wo 
for different gain mismatches Y2 at the second microphone; 

Fig. 7 shows the ASNRintciHg and SDjntdiig for QIC-GSC as a function of J? for 
different gain mismatches Y2 at the second microphone; 

Fig. 8 depicts the complexity of TD and FD Stochastic Gradient (SG) algorithm with 
LP filtering as a function of filter length L per channel; M = 3 (for comparison, the 
complexity of the standard NLMS ANC and SPA are depicted too); 

Fig. 9 depicts the performance of different FD Stochastic Gradient (FD-SG) 
algorithms; (a) Stationary speechlike noise at 90°; (b) Multi-talker babble noise at 90°; 

Fig. 10 depicts the influence of LP filter on performance of FD stochastic gradient 
SP-SDW-MWF = 0.5) without w 0 and with w 0 . Babble noise at 90°; 

Fig. 1 1 depicts the convergence behavior of FD-SG for X = 0 and X = 0.9998. The 
noise source position suddenly changes from 90° to 180° and vice versa; 

Fig. 12 depicts the performance of FD stochastic gradient implementation of SP- 
SDW-MWF with LP (X= 0.9998) in a multiple noise source scenario; and 

Fig. 13 depicts the performance of FD SPA in a multiple noise source scenario. 

Detailed Description 

Before the invention is described in detail, the prior art GSC [4] and the QIC-GSC 
[14, 19] will be reviewed under section 1. Under section 2, the Multi-channel Wiener 
Filter (MWF) technique will be discussed [20]. 



1 Generalized Sidelobe Canceller (GSC) 



1.1 Concept 

Figure 1 describes the concept of the Generalized Sidelobe Canceller (GSC) [4], which consists of a fixed, 
spatial pre-processor, i.e., a fixed beamformer A(z) and a blocking matrix B(z), and an ANC Given M 
microphone signals 

Ui[k) = ttffc] + u?[fc], i = 1, M (1) 

with xl* [k] the desired speech contribution and v?[k] the noise contribution, the fixed beamformer A(z) 
(e.g., delay-and-sum) creates a so-called speech reference 

yo[fc]-yS[fc] + y?[*], (2) 

by steering a beam towards the direction of the desired signal with a speech contribution Vo[k] and a noise 
contribution yfi [k]. In the sequel an endfire array is assumed and the desired speaker is assumed to be in 
front at 0°. The blocking matrix B(z) creates M — 1 so-called noise references 

Vi[fc] - Vi [k] + y?[fc], i = 1, .... M - 1 (3) 

by steering zeroes towards the front so that the noise contributions are dominant compared to the 

speech leakage contributions yf [k]. In the sequel, the superscripts s and n are used to refer to the speech 
and noise contribution of a signal. During periods of speech + noise, the references i = 0, M — 1 
contain speech + noise. During periods of noise only, y»[k] f i == 0, M — 1 only consist of a noise 
component, i.e., yi[k] = y?[k]. The second order statistics of the noise signal are assumed to be quite 
stationary such that they can be estimated during periods of noise only. 

To design the fixed, spatial pre-processor, assumptions are made about the microphone characteristics, 
the speaker position and the microphone positions and furthermore reverberation is assumed to be absent. 
If these assumptions are satisfied, the noise references do not contain any speech, i.e., yf [k] = 0, for 
i = 1, M — 1. However, in practice, the assumptions are often violated (e.g. due to microphone 
mismatch and reverberation) so that speech leaks into the noise references. To limit the effect of such signal 
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leakage, the ANC wi-m-i 1 



where 



W? M -1 = [ wf wf ... wft^ ] 

[L-l]] > 



(4) 
(5) 



Wi = [ u;,[0] Wi[l] ... Wi[ 

is adapted during periods of noise only [7, 13]. Hence, the ANC wi : Af _i minimizes the output noise power, 
i.e., . . 

(6) 



w 1: m-i = arg w mm - A] - wg M _ I [*]yJ sM _ 1 [fc]| } 



and equals 



where 



w 1:M -i = ^{yjAf-xy^.J-^ {y?:Af-iJ/r I* - a]>, 



y?[k] = y?(fc-i] - y?[k-L + i]\ 



(7) 

(8) 
(9) 



and where A is a delay applied to the speech reference to allow for non-causal taps in the filter Wi-.m-i- The 
delay A is usually set to where far] returns the smallest integer equal or larger than x. The subscript 
1 : M — 1 in vf\ : M-\ and yi : jvf-i refers to the subscripts of the first and last channel component of the 
adaptive filter and input vector, respectively. 

Under ideal conditions (yf [k] = 0, i = 1, , , , . M — 1), the GSC rninimizes the residual noise while 
not distorting the desired speech signal, i.e., z s [k] = yg[fc — A]. However, when used in combination with 
small-sized arrays, a small error in the assumed signal model (hence yf [A:] ^ 0, i = 1, M — 1) already 
suffices to produce a significantly distorted output speech signal z 3 [k] 



z 3 [k] = y 9 0 [k — A] — w&^yf^l*], 



(10) 



even when only adapting during noise-only periods, so a robustness constraint on wi-.m-i IS required [17]. 
In addition, the fixed beamformer A(z) should be designed so that the distortion in the speech reference 
yg[&] is minimal for all possible model errors. In the sequel, a delay-and-sum beamformer is used. For 
small-sized arrays, this beamformer offers sufficient robustness against signal model errors, as it nunimizes 
the white noise gain or noise sensitivity 2 . Given statistical knowledge about the signal model errors that 
occur in practice, further optimized beamformers can be designed, e.g., using the techniques in [31]. 



'in a time-domain implementation, the input signals of the adaptive filter wj ; m-i and the filter whm-i are real. Hence, 
w ?Af-i = w Jm-i- 1° ^ sequel, the formulas are generalized to complex input signals so that they can also be applied to a 
subband implementation. 

2 The white noise gain or noise sensitivity is denned as the ratio of the spatially white noise gain to the gain of the desired signal 
and is often used to quantify the sensitivity of an algorithm against errors in the assumed signal model [2, 14]. 



1.2 Quadratic Inequality Constraint (QIC-GSC) 

A common approach to increase the robustness of the GSC is to apply a Quadratic Inequality Constraint 
(QIC) [9]-[14, 19] to the ANC filters wi : M-b so that the optimization criterion (6) of the GSC is modified 
into 



w 1:M -i = arg min £{\y%[k - A] - ^m-iWyZm-iH 2 } 
subject to w^^iW^-i < p 2 . 



01) 



The QIC avoids excessive growth of the filter coefficients w. Hence, it reduces the undesired speech distor- 
tion when speech leaks into the noise references. In [14, 19], it is shown that -for a GSC with a blocking 
matrix B(/) that satisfies B H (/)B(/) = I- the QIC on the ANC filters corresponds to a constraint on the 
noise sensitivity. 

In [14], the QIC-GSC is implemented by using the adaptive scaled projection algorithm: at each update 
step, the quadratic constraint is applied to the newly obtained ANC filter by scaling the filter coefficients 
by -J when w(f M _ 1 w 1:W _ 1 exceeds 0 1 . Although this technique works well for LMS updating, it 

does not appear to be as effective for RLS as for LMS [19]. Recently, Tian et al. implemented the quadratic 
constraint by using variable loading [19]. For RLS, this technique provides a better approximation to the 
optimal solution (11) than the scaled projection algorithm. For LMS, variable loading does not appear to 
offer any performance advantage over the cheaper, scaled projection LMS. 

2 Multi-channel Wiener filtering (MWF) 
2*1 Concept 

Recently, a Multi-channel Wiener filtering (MWF) technique has been proposed that provides a Minimum 
Mean Square Error (MMSE) estimate of the desired signal portion in one of the received microphone signals 
[2 1 , 22, 23, 24]. In contrast to the GSC, this filtering technique does not make any a priori assumptions about 
the signal model and is found to be more robust [16, 17, 21]. Especially in complicated noise scenarios such 
as multiple noise sources or diffuse noise, the MWF outperforms the GSC, even when the GSC is supplied 
with a robustness constraint [17]. 

The MWF w i: M e C MLx 1 minimizes the Mean Square Error (MSE) between a delayed version of the 
(unknown) speech signal uf [k - A] at the i-th (e.g., first) microphone and the sum w{f M uj ; Af [k] of the M 
filtered, received microphone signals: 



Wi :M = axg min£ j|tz?[A; - A] - w£ M u X:M [fc]| 2 } , 



(12) 



• 



leading to: 



with 



wi:m = S{u lM [k]u^ M [k]} l eiu 1M [k]^[k - A]}, 



= [ wi w 2 • • • W M ] , 
ufitfW = [ui[fc]'ua[*] •■■ u M [k)] H , 

Ui[k] = [ui[fc] tn[fc-l] Ui[fc-L + 1]] T . 



(13) 

(14) 
(15) 
(16) 



An equivalent approach consists in estimating a delayed version of the (unknown) noise signal uj* [h — A] 
in the i-th microphone, resulting in 



wi;M = arg min S ||<[A; - A] - wff^u^Af (fc]| 2 J , 



and 



where 



w 1:M = ^[fcAWJ^fuhMWIfc " A]}, 



\M — [ 



Wi W2 



r 



(17) 



(18) 



(19) 



The estimate of the speech component u\ [k — A] is then obtained by subtracting the estimate u?[k — A] = 
w{( M ui : A*[fc] from the delayed, i-th microphone signal Ui[k — A], i.e. 

u\[k - A] = m[k - A] - w{f M Ui :A f [*].• (20) 

This is depicted in Figure 2 for tAj^fc — A] = [/c — A]. Using (13) and (18), it can be easily shown that 

Wi ; Af + wi :M = e^^+A, (21) 

with e/ the J-th canonical vector, defined as 

e r° ... 0 ^ 0 of (22) 

L position I J 

This shows that the two approaches indeed lead to exactly the same speech signal estimate. A procedure for 
computing w 1: j^ or wi : jvf will be given in Section 2.3. 



2.2 Trade-off speech distortion versus noise reduction (SDW-MWF) 

The residual error energy equals 



Si\e[k\n = £{\t4[k - A] - w£ M u 1:M [*]| 2 }, 



(23) 



and can be decomposed as 



€{\ui[k - A] — w£ M uf :M [k}\ 2 } + e{\*» M ul M [k]\ 2 } 



(24) 



where equals the speech distortion energy and c£ the residual noise energy. The design criterion of 
the MWF can be generalized to allow for a trade-off between speech distortion and noise reduction, by 
incorporating a weighting factor /x [20] with € [0, oo] 



w 1:M = argmin £{\uf[k - A] - w(f M uf :M [fc]| } + A^{|w{f M u? :M [fc]| }. 

w l: A# 



The solution of (13) is given by 



Wl:M = £Wi: M [kWi%[k) + MU? :M [fcK^[fc]}-^ {uf :M [*]uV[fc - A]}, 



(25) 



(26) 



which corresponds to the Wiener formula with an adjustable input noise level. Note that (18) is obtained 
with \i = 1 and that (21) still applies. The filter (26) corresponds to the time-domain constrained estimator 
proposed in [32], which optimizes the following criterion: 



mineS subject to e 2 n < a^{u^u? :M } 



(27) 



where 0 < a < 1 and /x is the Lagrange-multiplier. 

Equivalently, the optimization criterion for w in (13) can be modified into 



w 1:M = argmin £{|w£ M uf :A ,[fc]| 2 } + ^{\u^[k - A] - w{f M u? :Af (fc]| 2 }, 



resulting in 



wi:M = 6 {u? :M [A)u^[fc] + -u? :Af [*]«•& [k]}- l £{ul M [k)ur[k - A]}. 



(28) 



(29) 



In the sequel, we will refer to (29) as the Speech Distortion Weighted Multi-channel Wiener Filter (SDW- 
MWF). 

The factor /i € [0, oo] trades off speech distortion versus noise reduction. If fj. = 1, the MMSE criterion 
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(12) or (17) is obtained If p > 1, the residual noise level will be reduced at the expense of increased speech 
distortion. By setting to oo, all emphasis is put on noise reduction and speech distortion is completely 
ignored. This results inw = 0orw = e(t-i)i,+A> which means that the output signal equals 0. Setting fi 
to 0 on the other hand, results in w = &(i-i)L+A or w = 0 and hence in no noise reduction. 

2.3 Implementation of MWF 

In practice, the correlation matrix £{uf :M [fc]u£^[fc]} is unknown. During periods of speech, the inputs 
Ui[k] consist of speech + noise, i.e., Ui[k) = u?[k] + u?[k] y i = 1, M. During periods of noise, 
only the noise component u?[k] is observed. Assuming that the speech and noise signal are uncorrelated, 
^ u !:mW u i:M can b e estimated as 

£{uf : Af[*K:M W> = £{U1:W[*]U£ M [*]} ~ SM-.M [*]ujS[*]>, (30) 

where the second order statistics £{u 1:A f [fc]u{f M [k]} are estimated during speech + noise and the statistics 
£{ u i:M W U S [*]} durm g periods of noise only. Like for the GSC, a robust speech detection is thus needed. 
Using (30), (29) and (26) can be re-written as: 



wi :A f = (j€ {u 1:A f [k]ug M [k]} + (1 - i)f {< :Atf [fc]u?;*[fc]}) 1 €{ul M [kK'*[k - A]} 



(31) 



and 



Wl:M = (f{Ul:M[*]u{f M [fc]} + (M-l)f{u? :M [fcK^[A:]}) * 

x (£{u 1:M [kK [k - A]} - £{ul M [kW[k - A]}) . 



(32) 



In [21], the Wiener filter is computed at each time instant k by means of a Generalized Singular Value 
Decomposition (GS VD) of an speech +■ noise and noise data matrix. A cheaper recursive alternative based on 
a QR-decomposition has been proposed in [22]. In [23, 24], a subband implementation has been developed 
to increase intelligibility and reduce complexity, making it suitable for hearing aid applications. 

Finally note that instead of estimating £ {uf . M [fc]u£j£[fc]} online using (30), a pre-determined estimate 
of £{u£ :M [fc]ui;^[A:]} is sometimes used [25, 33], In [25], this estimate is derived from clean speech 
recordings measured during an initial calibration phase. Additional recordings of the source speech signal 
allow to produce an estimate of the non-reverberant source speech signal instead of an estimate of the 
reverberant speech component in one of the microphone signals. However, since the room acoustics, the 
position of desired speaker and microphone characteristics may change over time, frequent re-calibration 
is required. In [33], a mathematical estimate of the correlation matrix and the correlation vector of the 
non-reverberant speech is exploited in which some signal model errors are taken into account. 
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In this Section, the present invention is described in detail. 

In Section 3, the proposed adaptive multi-channel noise reduction technique, referred to as Spatially 
Pre-processed Speech Distortion Weighted Multi-channel Wiener filter, is described. 

Section 3.2 describes a first embodiment, referred to as Speech Distortion Regularized GSC (SDR- 
GSC). A new design criterion is developed for the adaptive stage of the GSC: the ANC design criterion is 
supplemented with a regularization term that limits speech distortion due to signal model errors. In the SDR- 
GSC, a parameter p, is incorporated that allows for a trade-off between speech distortion and noise reduction. 
Focussing all attention to noise reduction, results in the standard GSC, while, on the other hand, focussing 
all attention towards speech distortion results in the output of the fixed beamformer. In noise scenarios with 
low SNR, adaptivity in the SDR-GSC can be easily reduced or excluded by increasing attention towards 
speech distortion, i.e., by decreasing the parameter ^ to 0. The SDR-GSC is an alternative technique to 
the QIC-GSC to decrease the sensitivity of the GSC to signal model errors such as microphone mismatch, 
reverberation, .... In contrast to the QIC-GSC, the SDR-GSC shifts emphasis towards speech distortion 
when the amount of speech leakage grows. In the absence of signal model errors, the performance of the 
GSC is preserved. As a result, a better noise reduction performance is obtained for small model errors, while 
guaranteeing robustness against large model errors. 

In a second embodiment, described in Section 3.3, we further improve the noise reduction performance 
of the SDR-GSC by adding an extra adaptive filtering operation w 0 on the speech reference signal. We refer 
to this generalized scheme as Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener 
Filter (SP-SDW-MWF). The SP-SDW-MWF is depicted in Figure 3 and encompasses the MWF as a special 
case. Again, a parameter p. is incorporated in the design criterion to allow for a trade-off between speech 
distortion and noise reduction. Focussing all attention to speech distortion, results in the output of the fixed 
beamformer. Also here, adaptivity can be easily reduced or excluded by decreasing p, to 0. It is shown 
that -in the absence of speech leakage and for infinitely long filter lengths- the SP-SDW-MWF corresponds 
to a cascade of a SDR-GSC with a SDW-SWF postfilter. In the presence of speech leakage, the SP-SDW- 
MWF with w 0 tries to preserve its performance: compared to a SDR-GSC with SDW-SWF postfilter, the 
SP-SDW-MWF then contains extra filtering operations that compensate for the performance degradation of 
the SDR-GSC with SDW-SWF due to speech leakage. In contrast to the SDR-GSC (and thus also the GSC), 
performance does not degrade due to micrdphone mismatch. In [22, 27] recursive implementations of the 
(SDW-)MWF have been proposed based on a GSVD or QR decomposition. A subband implementation [28] 
results in improved intelligibility at a significantly lower cost compared to the fullband approach. These 
techniques 3 can be extended to implement the SDR-GSC arid, more generally, the SP-SDW-MWF. 

In a third embodiment, described in Section 4, we propose cheap time-domain and frequency-domain 
stochastic gradient implementations of the SDR-GSC and SP-SDW-MWF. Starting from the design crite- 
rion of the SDR-GSC, or more generally, the SP-SDW-MWF, we derive a time-domain stochastic gradient 
3 The implementation based on GSVD can only be used for the SP-SDW-MWF with filter w 0 . 
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algorithm. In addition, we modify the LMS based algorithm [26] so that it applies to the SP-SDW-MWF. To 
increase convergence and reduce complexity, a frequency-domain implementation has been proposed. Both, 
the stochastic gradient and LMS based algorithm suffer from a large excess error when applied in highly 
time-varying noise scenarios. We show that the excess error in the stochastic gradient algorithm is reduced 
by applying a low pass filter to the part of the gradient estimate that limits speech distortion. The low 
pass filtering avoids a highly time- varying distortion of the desired speech component while not degrading 
the tracking performance needed in time-varying noise scenarios. The stochastic gradient SP-SDW-MWF 
outperforms the LMS based algorithm, while complexity is not increased. Experimental results show that 
the low pass filtering significantly improves the performance of the stochastic gradient algorithm and does 
not compromise the tracking of changes in the noise scenario. In addition, experiments demonstrate that 
the proposed stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over QIC-GSC. The 
limited computational cost and the better noise reduction performance of the proposed algorithm make it a 
good alternative to the SPA [14] for implementation in hearing aids. 

3 Spatially pre-processed SDW Multi-channel Wiener filter 

3.1 Concept 

Figure 3 describes the Spatially pre-processed, Speech Distortion Weighted Multi-channel Wiener filter 
(SP-SDW-MWF). The SP-SDW-MWF consists of a fixed, spatial pre-processor, i.e., a fixed beamformer 
A(z) and a blocking matrix B(z), and an adaptive Speech Distortion Weighted Multi-channel Wiener filter 
(SDW-MWF). Given M microphone signals 

Ui[k) = ui[k) + «?[*], t = 1, M (33) 

with uf [k] the desired speech contribution and u?[k] the noise contribution, the fixed beamformer A (z) 
creates a so-called speech reference 

Vo[k]=y%k] + y$[k), (34) 

by steering a beam towards the direction of the desired signal with a speech contribution ygffc] and a noise 
contribution [k]. In the sequel an endure array is assumed and the desired speaker is assumed to be in front 
at 0°. To preserve the robustness advantage of the MWF, the fixed beamformer A(z) should be designed 
so that the distortion in the speech reference y$[k] is minimal for all possible errors in the assumed signal 
model such as microphone mismatch. In the sequel, a delay- and-s urn beamformer is used. For small-sized 
arrays, this beamformer offers sufficient robustness against signal model errors as it minimizes the white 
noise gain or noise sensitivity 4 . Given statistical knowledge about the signal model errors that occur in 
practice, a further optimized beamformer A(z) can be designed, e.g., using the techniques in [31]. The 

"The white noise gain or noise sensitivity is denned as the ratio of the spatially white noise gain to the gain of the desired signal 
and is often used to quantify the sensitivity of an algorithm against errors in the assumed signal model [2, 14]. 



blocking matrix B(z) creates M — 1 so-called noise references 



Vi[k) = ym + »?[*], i = 1, M - 1 



(35) 



by steering zeroes towards the front so that the noise contributions y?[k] are dominant compared to the 
speech leakage contributions y*[k). A simple technique to create the noise references consists of pairwise 
subtracting the for 0° time-aligned microphone signals. Using [31, 34], further optimized noise references 
can be created. Speech leakage can then be minimized for a specified angular region around 0° instead of 
for 0°only, e.g., for an angular region from —20° to 20°. In addition, given statistical knowledge about the 
signal model errors that occur in practice, speech leakage can be minimized for all possible model errors by 
using [31]. 

In the sequel, the superscripts s and n are used to refer to the speech and noise contribution of a signal. 
During periods of speech + noise, the references m[k] 9 i = 0, M — 1 contain speech + noise. During 
periods of noise only, yi[k), i — 0, M — 1 only consist of a noise component, i.e., yi[k] = 2/J*[&]. The 
second order statistics of the noise signal are assumed to be quite stationary such that they can be estimated 
during periods of noise only. 

The SDW-MWF filter 5 worw-i 



(36) 



with 



w£ M -iM = [w*[fc] wf[*] ... wS_ x [*]], 

w<[fcl = [ w[0] w[l) ... w[L-l)Y 

y&tf-il*] = [yf[k] yf[fc] ... yfi-iW ] . 

y<[fe] = [kW ••• ui[k-L + i] ] T , 



(37) 
(38) 
(39) 
(40) 



provides an estimate w^f M _ 1 y 0 :M-iA;] of the noise contribution yjj[fc — A] 6 in the speech reference by 
minimizing the cost function J(vro-.\f-i) 



J(w 0 :M-0 = ± £{\*>$ M -i[k]yi.. M - 1 [k]\ 2 }+ £ {\v3[k - A) - ™§l M -i[k\y3:M-i[k]\ 2 } 

n > v ' > v ' 



(41) 



5 In a time-domain implementation, the input signals of the adaptive filter and the filter wq ; m-i are real and hence, w£ M _j = 
w £a*-i- ln * e sequel, the formulas are generalized to complex input signals so that they can also be applied to a subband 
implementation. 

6 The delay A is applied to the speech reference to make the filter w non-causal. Usually, it is set to , where fx] returns the 
smallest integer equal or larger than x. 
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The subscript 0 : M - 1 in w 0: m-i and yo:M-i refers to the subscripts of the first and last channel 
component of the adaptive filter and input vector, respectively. The term e\ represents the speech distortion 
energy and e\ the residual noise energy. The term JeJ in the cost function (41) limits the possible amount 
of speech distortion at the output of the SP-SDW-MWF. Hence, the SP-SDW-MWF adds robustness against 
signal model errors to the GSC by taking speech distortion explicitly into account in the design criterion of 
the adaptive stage. The parameter ^ € [0, oo) trades off between noise reduction and speech distortion: the 
larger ^, the smaller the amount of possible speech distortion. For /i = 0, the output of the fixed beamformer 
A(z), delayed by A samples is obtained. In noise scenarios with very low Signal-to-Noise Ratio (SNR), 
e.g., -10 dB, a fixed beamformer may be preferred. Adaptivity can be easily reduced or excluded in the 
SP-SDW-MWF by decreasing /a to 0. Alternatively, adaptivity can be limited by applying a QIC to wo-.M-i- 
Note that when the fixed beamformer A(z) and the blocking matrix B(z) are set to 



A(z) = [ 1 0 ... 0 ] H 
B(z) = 



0 1 

0 x 



0 
0 



1 0 
0 1 



(42) 



(43) 



we obtain the original SDW-MWF that operates on the received microphone signals «,•[*], i = 1, M. 

Below, the different parameter settings of the SP-SDW-MWF are discussed Depending on the setting of 
the parameter & and presence or absence of the filter wo, the GSC, the (SDW-)MWF as well as in-between 
solutions such as the Speech Distortion Regularized GSC (SDR-GSC) may be obtained. We distinguish 
between two cases, i.e., the case where no filter w 0 is applied to the speech reference (filter length L 0 = 0) 
and the case where an additional filter w 0 is used (L 0 ^ 0). 

The adaptive stage of the SP-SDW-MWF can be implemented using the recursive QRD-based imple- 
mentation of the SDW-MWF [22]. Like for the SDW-MWF, complexity can be reduced by a subband 
implementation [23]. For L 0 ^ 0, also the GSVD based algorithm [20] can be applied. Cheaper stochastic 
gradient based algorithms are proposed in Section 4. 

3.2 First embodiment: SDR-GSC, i.e., SP-SDW-MWF without w 0 

First, consider the case without w 0 , i.e. Lq = 0. The solution for w 1:A f _i in (36) then reduces to 



(44) 



leading to 



W 1:M-1 




where is the speech distortion energy and e„ the residual noise energy. 

Remark: For L 0 — 0, it is readily seen that does not hold, i.e., v/i.m-\ + wv.M-i ^ where 



because the speech component yf^.j [k] in the input to the adaptive filter v/uM-idoes not contain the 
estimated speech signal y$[k — A]. 

If n = 1, the classical MMSE criterion (cfr. (17)) is obtained. 

Compared to the optimization criterion (6) of the GSC, a regularization term 



has been added. This regularization term limits the amount of speech distortion that is caused by the filter 
wi;M-i when speech leaks into the noise references, i.e., yf[k] ^ 0, i = 1, M — 1. In the sequel, we 
therefore refer to the SP-SDW-MWF with L 0 = 0 as Speech Distortion Regularized GSC (SDR-GSC), The 
smaller p,, the smaller the resulting amount of speech distortion will be. For /* = 0, the output of the fixed 
beamformer A(z) delayed by A samples, is obtained. For p. = oo, all emphasis is put on noise reduction and 
speech distortion is not taken into account. This corresponds to the GSC. Hence, the SDR-GSC encompasses 
the GSC as a special case. 

The regularization term ^{Iw^.j^yJ^^!^]! 2 } with ^ ^ 0 adds robustness to the GSC, while 
not affecting the noise reduction performance in the absence of speech leakage. 

• In the absence of speech leakage, ie. t yf[k) = 0, i = 1, M — 1, the regularization term equals 0 
for all w 1: m-i and hence the residual noise energy e£ is effectively rninimizecL- In other words, in the 
absence of speech leakage, the GSC solution is obtained. 

• In the presence of speech leakage, i.e., yf[k] ^ 0, i = 1, M — 1, speech distortion is taken 
into account in the optimization criterion (44) for the adaptive filter w, limiting speech distortion 
plus reducing noise. The larger the amount of speech leakage, the more attention is paid to speech 
distortion. 

To limit speech distortion alternatively, a QIC is often imposed on the filter v/im-i (see Section 1 .2). 
In contrast to the SDR-GSC, the QIC acts irrespective of the amount of speech leakage y 3 [k] that is 
present. The constraint value 0 2 in (1 1) has to be chosen based on the largest model errors that may 
occur. As a consequence, noise reduction performance is compromised even when no or very small 



Wi;M-l 



= (^{ytM-iy'i^-J + ^yjAf-iy^-a}) 1 eWZf-ivn* - A]}, (46) 



^{|w^-iy!:M-iW| 2 } 

ft 



(47) 
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model eiTors are present Hence, the QIC is more conservative than the SDR-GSC. The experimental 
results in Section 3 .4 confirm this. 



3,3 Second embodiment: SP-SDW-MWF with filter w 0 

Since the SDW-MWF (36) takes speech distortion explicitly into account in its optimization criterion, an 
additional filtering w 0 ,on the speech reference yo[k] may be added The SDW-MWF (36) then solves the 
following more general optimization criterion 



w 0: a/-i = arg mm 



r.,*{|*-*i-K <«-.][ yr S tl ]f} 



'I 



(48) 



where w^f M _ a = [w^ wff^.J is given by (36). 

Again, /i trades off speech distortion and noise reduction. For = oo, speech distortion is completely 
ignored so that the solution becomes 



UM-l ] 



(49) 



which results in a zero output signal. For ^ = 0, all attention is paid to speech distortion so that the output 
of the fixed beamformer delayed by A samples, is obtained. 

• In the absence of speech leakage, i.e., yf [k] = 0 for i = 1, M — 1, and for infinitely long filters 
w if 2 = 0, M - 1, the SP-SDW-MWF with w 0 corresponds to the cascade of a SDR-GSC and a 
SDW Single-channel WF (SDW-SWF) postfilter [30, 35]. 

Proof: In case of infinite filter lengths, the SP-SDW-MWF W 0: m-i(/) and its optimization 
criterion can be represented in the frequency-domain: 

»•«-■(/) - »S w mjto J>y [<=*p(-j2»/A>-HW)) -Wf„.,U)] Y f^(/)]| ) 

+ ^{i[«w> w ^)][vi y w/,]r} < 3 °' 



Without loss of generality, we assume -for reasons of simplicity- A = 0. 
Decompose W 1:W _ 1 (/) as 

W 1:M -i(/) = (1 - W 0 (/)) W^M-xCf) (51) 

with Wo(/) a single-channel and Wd,i-.M-i(/) a multi-channel filter and define an intermediate 
output V(f) (see also Figure 4) as 

V(f)=Yo(f)-W^ 1M _ 1 (f)Y 1M -iU). (52) 
Then, the cost function J{W 0 , "W^i-.m -l) of (50) can be re-written as 

J = £ {|(1 - WJ(/)) V"(/)| 2 } + if {|W 0 '(/)W/) + W^^.^/jYf.^.^/)) 2 } . (53) 

From z^J(W 0 , W d ,i :M -i) = 0, we find 

WbC/) = (f {V n V»>*} + If{V'V---}) _1 (S{Y*V»-*} - ^{VYj^W^-i}) , (54) 

This single-channel filter Wq(/) consists of two terms. 

- The first term 

Wb,i(/) = ^{rK n '*} + ^€{V 3 V 9 ^ 1 £{K n V m -*} (55) 

estimates the noise component V n (f) in the intermediate output V(/). The filter 1 - W 0t i cor- 
responds to a SDW Single-channel Wiener Filter (SDW-SWF) that estimates the speech compo- 
nent V s (f). 

- The second term 

WoM) = (s{V"V»>*} + ±*{Vnf>) 1 (- J WYSS^W^jr.!}) (56) 

estimates the speech leakage filtered by W rf|1:M _ 1 (/), i.e., -W^ 1:Af _ 1 Yfj M _ 1 . The speech 
component in the intermediate output V(/) equals V 9 (f) — Y£ — W^^^Yf^ The filter 
Wb,2 (/) tries to compensate for the distortion — W^ 1:M _ 1 YJ :M _ 1 by adding an estimate of 
w £i:Af-i Y !:A*-i to *c output of the SDW-SWF. 

In the absence of speech leakage (i.e., YJ :Af _j = 0), the filter W 0t2 (f) equals zero and 1 - W Q (f) 
corresponds to a SDW-SWF. 



From $w d ? l:M „i J(Wo,W<t t i:M-i) — 0, we obtain the following solution for W rfjl:A f-i(/): 
W dil:M -i(/) = (^{Y^.^^J + iflYf^^Y^.j) 1 

Also the multi-channel fitter W^Af _i (/) consists of two terms. 

- The first term corresponds to the SDR GSC 

{z&Im-^Z-J + ^{Yf :A f-iY;;^ 1 })'" l ^{Y? :M . 1 ^} (58) 

and estimates the noise component Yjj?(f) at the output of the fixed beamformer. 

- The second term tries to compensate for the speech distortion — Wq (f)Y£(f) caused by Wq(J) 
by adding an estimate of 1 ^* / (/) ^o (/) to the output of the SDR-GSC. Note that this corre- 
sponds to adding an estimate of W${f)Y$(f) to the output Z(f) of the SP-SDW-MWF. 

In the absence of speech leakage, Wd,i:M-i(/) corresponds to a SDR-GSC or a GSC. 
Figure 4 illustrates graphically the solution for VFd 9 v.M-i(f) and Wo(f) for A = 0. In the absence of 
speech leakage, the niters that try to compensate for the speech distortion equal 0, hence, the SP-SDW- 
MWF corresponds to a SDR-GSC (or GSC) with SDW-SWF postfilter. The SP-SDW-MWF achieves 
the same or a better Signal- to-Noise Ratio (SNR) improvement than the SDR-GSC, depending on the 
noise scenario. ■ 

3.4 Experimental results 

This Section illustrates the theoretical results of Section 3.2 and Section 3.3 by means of experimental 
results for a hearing aid application. Section 3.4.1 and Section 3.4.2, respectively, describe the set-up and 
the performance measures that are used. In Section 3.4.3, the impact of the different parameter settings of 
the SP-SDW-MWF on the performance and the sensitivity to signal model errors is evaluated. Comparison 
is made with the QIC-GSC. 

3.4.1 Set-up 

A three-microphone Behind-The-Ear (BTE) hearing aid with three omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an office room. The interspacing d between the first and 
the second microphone is about d = 1 cm and the interspacing between the second and third microphone 
about 1.5 cm. The reverberation time TeodB is about 700 ms for a speech weighted noise. The desired 
speech signal and the noise signals are uncorrelated. Both the speech and the noise signal have a level of 
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70 dB SPL at the center of the head. The desired speech source and noise sources are positioned at a distance 
of 1 meter from the head: the speech source in front of the head, the noise sources at an angle 6 w.r.t. the 
speech source. To get an idea of the average performance based on directivity only, stationary speech and 
noise signals with the same, average long-term power spectral density are used. The signals can be found 
on [36]. The total duration of the input signal is 10 seconds of which 5 seconds contains noise only and 5 
seconds contain both the speech and noise signal. For evaluation purposes, the speech and noise signal have 
been recorded separately. 

The microphone signals are pre-whitened prior to processing to improve intelligibility [37], and the 
output is accordingly de-whitened. In the experiments, the microphones have been calibrated by means of 
recordings of an anechoic speech weighted noise signal positioned at 0° measured while the microphone 
array was mounted on the head. A delay-and-sum beamformer is used as a fixed beamformer, since -in case 
of small microphone interspacing - it is robust to model errors. The blocking matrix B pairwise subtracts 
the time aligned calibrated microphone signals. 

To investigate the effect of the different parameter settings (i.e. fi, wo) on the performance only, the 
filter coefficients are computed using (36) where £{yo : M-iyo!M-i} * s estimated by means of the clean 
speech contributions of the microphone signals. In practice, £{yo : jvf-iyo!M-i} is approximated using (30). 
The effect of approximation (30) on the performance was found to be small (i.e. differences of at most 
0.5 dB in intelligibility weighted Signal-to-Noise ratio improvement) for the given data set. The QIC-GSC 
is implemented using variable loading RLS [19]. The filter length L per channel equals 96. 

3.4.2 Performance measures 

To assess the performance of the different approaches, the broadband intelligibility weighted signal-to-noise 
ratio improvement [38] is used, defined as 

ASNRintcUig = /i(SNR* |0Ut - SNRi.in), (59) 

i 

where the band importance function U expresses the importance of the i-th one-third octave band with 
center frequency /? for intelligibility, SNRi >oul is the output SNR (in dB) and SNRi )in is the input SNR 
(in dB) in the i-th one third octave band. The center "frequencies "/f arid uie'valuesTi are defined in [39]. 
The intelligibility weighted signal-to-noise ratio reflects how much intelligibility is improved by the noise 
reduction algorithms, but does not take into account speech distortion. 

To measure the amount of speech distortion, we define the following intelligibility weighted spectral 
distortion measure 

SDintemg^X^ 81 * ( 6 °) 



with SDi the average spectral distortion (dB) in i-th one-third band, measured as 

SD< = Lw r |101 ° gl ° df / K 2V6 ~ 2_1/6 ) ' (61) 

with G 3 (f) the power transfer function of speech from the input to the output of the noise reduction algo- 
rithm. 

To exclude the effect of the spatial pre-processor, the performance measures are calculated w.r.t the 
output of the fixed beamformer. 

3.4.3 Experimental results 

The impact of the different parameter settings for p and wo on the performance of the SP-SDW-MWF is il- 
lustrated for a five noise source scenario. The five noise sources are positioned at angles 75°, 120°, 180°, 240°, 
285° w.r.t the desired source at 0°. To assess the sensitivity of the algorithm against errors in the assumed 
signal model, the influence of microphone mismatch, e.g., gain mismatch of the second microphone, on 
the performance is depicted. Among the different possible signal model errors, microphone mismatch was 
found to be especially harmful to the performance of the GSC in a hearing aid application[17]. In hear- 
ing aids, microphones are rarely matched in gain and phase. In [3], gain and phase differences between 
microphone characteristics of up to 6 dB and 10°, respectively, have been reported 

SP-SDW-MWF without w 0 (SDR-GSQ 

Figure 5 plots the improvement ASNRinteiug and the speech distortion SDintemg as a function of ± obtained 
by the SDR-GSC (i.e., the SP-SDW-MWF without filter w 0 ) for different gain mismatches T 2 at the second 
microphone. In the absence of microphone mismatch, the amount of speech leakage into the noise references 
is limited. Hence, the amount of speech distortion is low for all p. Since there is still a small amount of 
speech leakage due to reverberation, the amount of noise reduction and speech distortion slightly decreases 
for increasing ±, especially for £ > 1. In the presence of microphone mismatch, the amount of speech 
leakage into the noise references grows. For ^ = 0 (GSC), the speech gets significantly distorted. Due to 
the cancellation of the desired signal, also the improvement ASNRintdiig degrades. Setting ~ > 0, improves * 
the performance of the GSC in the presence of model errors without compromising performance in the 
absence of signal model errors. 

SP-SDW-MWF with filter w 0 

Figure 6 plots the performance measures ASNRinteiug and SD^u^ of the SP-SDW-MWF with filter w 0 . 
In general, the amount of speech distortion and noise reduction grows for decreasing ^. For p. — oo, 
all attention is paid to noise reduction. As also illustrated by Figure 6, this results in a total cancellation 
of the speech and the noise signal and hence degraded performance. In the absence of model errors, the 
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settings Lq = 0 and Lq ^ 0 result - except for J = 0 - in the same ASNRinteUig 7 , while the distortion 
for the SP-SDW-MWF with w 0 is higher due to the additional single-channel SDW-MWF. For L 0 ^ 0, the 
performance does -in contrast to L 0 = 0 - not degrade due to the microphone mismatch. 

Comparison with QIC 

Figure 7 depicts the improvement ASNRinteiiig and the speech distortion SDimeUig, respectively, of the QIC* 
GSC as a function of f3 2 . Like the SDR-GSC, the QIC increases the robustness of the GSC. The QIC is 
independent of the amount of speech leakage. As a consequence, distortion grows fast with increasing gain 
deviation. The constraint value p should be chosen so that the maximum permissible speech distortion level 
is not exceeded for the largest possible model errors. This goes at the expense of reduced noise reduction for 
small model errors. The SDR-GSC on the other hand, keeps the speech distortion limited for all model errors 
(see Figure 5). Attention towards speech distortion is increased if the amount of speech leakage grows. As a 
result, a better noise reduction performance is obtained for small model errors, while guaranteeing sufficient 
robustness for large model errors. In addition, Figure 6 demonstrates that an additional filter w 0 significantly 
improves the performance of the SP-SDW-MWF in the presence of signal model errors. 

3.5 Conclusion 

In the present invention, we established a generalized noise reduction scheme, referred to as Spatially pre- 
processed, Speech Distortion Weighted Multi-channel Wiener filter (SP-SDW-MWF) , that consists of a fixed, 
spatial pre-processor and an adaptive stage that is based on a SDW-MWF. The new scheme encompasses the 
GSC and MWF as special cases. In addition, it allows for an in-between solution that can be interpreted as a 
Speech Distortion Regularized GSC. Depending on the setting of a trade-off parameter p and the presence 
or absence of the filter wo on the speech reference, the GSC, the SDR-GSC or a (SDW-)MWF is obtained. 

In Section 3.2 and Section 3.3, the different parameter settings of the SP-SDW-MWF have been inter- 
preted. 

• Without w 0 , the SP-SDW-MWF corresponds to a SDR-GSC: the ANC design criterion is supple- 
mented with a regularization term that limits the speech distortion due to signal model errors. The 
larger ±, the smaller the amount of distortion. For ^ = 0, distortion is ignored completely, which 
corresponds to the GSC-solution. The SDR-GSC is then an alternative technique to the QIC-GSC to 
decrease the sensitivity of the GSC to signal model errors. In contrast to the QIC-GSC, the SDR-GSC 
shifts emphasis towards speech distortion when the amount of speech leakage grows. In the absence 
of signal model errors, the performance of the GSC is preserved. As a result, a better noise reductipn 
performance is obtained for small model errors, while guaranteeing robustness against large model 
errors. 

7 For Lq / 0, the SNR improvement was larger thanks to the single channel SDW MWF postfilter (see Section 3.3). For other 
noise sources, e.g., a narrow band noise source, also a better improvement in SNRuuciuj can be achieved by Lq ^ 0 thanks to the 
single channel spectral filtering. 



• Since the SP-SDW-MWF takes speech distortion explicitly into account, a filter w 0 on the speech 
reference can be added. It is shown that -in the absence of speech leakage and for infinitely long filter 
lengths- the SP-SDW-MWF corresponds to a cascade of a SDR-GSC with a SPW-SWF postfilter. 
In the presence of speech leakage, the SP-SDW-MWF with w 0 tries to preserve its performance: 
compared to a SDR-GSC with SDW-SWF postfilter, the SP-SDW-MWF then contains extra filtering 
operations that compensate for the performance degradation of the SDR-GSC with SDW-SWF due to 
speech leakage. In contrast to the SDR-GSC (and thus also the GSC), performance does not degrade 
due to microphone mismatch. 

In Section 3.4, experimental results for a hearing aid application confirmed the theoretical results of Sec- 
tion 3.2 and Section 3.3. The SP-SDW-MWF indeed increases the robustness of the GSC against signal 
model errors. Comparison with the widely studied QIC-GSC demonstrated that the SP-SDW-MWF achieves 
a better noise reduction performance for a given maximum allowable speech distortion level. 

4 Third embodiment: Stochastic gradient implementations 

In [22, 27] recursive implementations of the MWF have been proposed based on a GSVD or QR decom- 
position. A subband implementation [28] results in improved intelligibility at a significantly lower cost 
compared to the fullband approach. These techniques can be extended to implement the SP-SDW-MWF. 
However, in contrast to the GSC and the QIC-GSC [14], no cheap stochastic gradient based implementation 
of the SP-SDW-MWF is available. In [25], an LMS based algorithm for the MWF has been developed. The 
algorithm needs recordings of calibration signals. Since room acoustics, microphone characteristics and the 
location of the desired speaker change over time, frequent re-calibration is required, making this approach 
cumbersome and expensive. In [26], an LMS based SDW-MWF has been proposed that avoids the need for 
calibration signals. The algorithm however relies on some independence assumptions that are not necessar- 
ily satisfied. In the present invention, we propose time-domain and frequency-domain stochastic gradient 
implementations of the SP-SDW-MWF that preserve the benefit of matrix-based SP-SDW-MWF over QIC- 
GSC. The LMS based SDW-MWF of [26] is modified so that it applies to the SP-SDW-MWF scheme. In 
addition, other stochastic gradient algorithms are developed that achieve a better performance, Experimental 
results demonstrate that the proposed stochastic gradient implementation of the SP-SDW-MWF outperforms 
the SPA, while its computational cost is limited. 

This section is organized as follows. Starting from the cost function of the SP-SDW-MWF, a time- 
domain stochastic gradient algorithm is derived in Section 4.1. Applying the independence assumptions 
made in [26] results in an LMS based SP-SDW-MWF similar to [26]. To increase convergence and reduce 
complexity, the stochastic gradient and LMS based algorithm are implemented in the frequency-domain. 
Both, the stochastic gradient and LMS based algorithm suffer from a large excess error, when applied in 
highly time- varying noise scenarios. In Section 4.2, we show that the performance of the stochastic gradient 
algorithm is improved by applying a low pass filter to the part of the gradient estimate that limits speech 



distortion. The low pass filtering avoids a highly time-varying distortion of the desired speech component 
while not degrading the tracking performance needed in time-varying noise scenarios. Section 4.3 compares 
the performance of the different frequency-domain stochastic gradient algorithms. Experimental results 
show that the proposed stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over the 
QIC-GSC. 

4*1 Stochastic gradient algorithm 
4.1,1 Derivation 

A stochastic gradient algorithm approximates the steepest descent algorithm, using an instantaneous gradient 
estimate. Given the cost function (41), the steepest descent algorithm iterates as follows 8 

- «M + §(-^) . 

\ / w=w[n] 

= w[n] + p (s{y n yytk - A]} - £{y"y"-"[fc]}w[n] - i£{yV'*[*]}w[n]) , (62) 

with w[fc], y[k] e C NLxl , where N denotes the number of input channels to the adaptive filter and L the 
number of filter taps per channel. Replacing the iteration index n by a time index k and leaving out the 
expectation values £{.}, we obtain the following update equation 



W[fc + 1] =Vf[k) + p< 
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For ~ — 0 and no filtering wq on the speech reference, equation (63) reduces to the update formula used in 
GSC during periods of noise only (i.e., when yi[k] = y^[k], i = 0, M — 1). The additional term r[k] in 
the gradient estimate limits the speech distortion due to possible signal model errors. 

Equation (63) requires knowledge of the correlation matrix y*y 3,H [k] or £{y 3 y s ' H [k]} of the clean 
speech. In practice, this information is not available. To avoid the need for calibration, speech + noise 
signal vectors y&u/j are stored into a circular buffer Bi € JSi NxLbu h during processing as in [26]. During 
periods of noise only (i.e., when yi[k] = y?[fc], i = 0, M— 1), the filter w is updated using the following 
approximation of the term r[k] = ^y s y 3,H [k\w[k] in (63) 

iy V* [*]w[fc] « i (y^yfift [*] - yy H [k)) w[fc), (64) 

*ln the sequel the subscripts 0 : M - 1 in the adaptive filter w 0: m-.i and the input vector yo:M-iare omitted for the sake of 
conciseness. 



This results in the update formula 



w[fc + 1] = w[fc] + p< 



y[k){yS[k - A] - y"[fe]w(fe]) - ± (y^y^ [fc] - yy K [k]) w[*j 

<- „ ' 

rlfc] 



(65) 



during periods of noise only. In the sequel, a normalized step size p is used, i.e., 

' 



(66) 



where 5 is a very small constant. The absolute value [y^yim/i - y H y| has been inserted to guarantee 
a positive valued estimate of the clean speech energy y ,,H y a [fc]. Additional storage of noise only vectors 
ytu/a 6 C MLx 1 in a second buffer B2 € K Mx tfcu ^ allows to adapt w also during periods of speech + noise, 
using 

W [* + 1] = w[*] + p {y6»/ a (yS,*u/»[ fc - A l -y£ A w[*]) + ^ (yw,y£/ a ( fc ] - yy"[ fe l) w t fc ]} («7) 

p 7 



with 



P = 



(68) 



£ |y"y - yjf /2 y6u/ 3 1 + y£ /2 ybu/ 3 + * 

In the sequel, we will - for reasons of conciseness- only consider the update procedure of the time-domain 
stochastic gradient algorithms during noise only, hence, y[k] = y n [fc]. The extension towards updating 
during speech + noise periods with the use of a second, noise only buffer B2 is straightforward: the equations 
are found by replacing the noise-only, input vectors y [k] by y^u / 2 [k] and the speech + noise vectors y^ ^ [k] 
by the input speech + noise vector y[fc]. 
Using 9 

= (^{yw.yft,,} + (i - £)£{yy*>) 1 £{y H vo[* - (69) 

where y is a noise-only vector, and (65) it can be shown that 



/ i 1 \ fc+1 

£{w[A;+l]-w op J = \i-ps{-y bufl yji fl + (i--)yy H }) Woi-w^} 



(70) 



Hence, the algorithm (65)-(67) is convergent in the mean provided that the step size p is smaller than 
with A max the maximum eigenvalue of S^y^y^ + (1 - £)yy H }. The similarity of (65) with standard 
NLMS let us presume that setting p < with A», i = 1, NL the eigenvalues of Si^y^y^ + 



9 When the second order statistics of the noise are short-term stationary, w opt equals to (36). 
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(1 - j:)yy H ] £ R NLxNL , or -in case of FIR niters- setting 

9 < ^^-n^mi^} + (i - i^E^-^^wy (71) 

guarantees convergence in the mean square. Equation (71) explains the normalization (66) and (68) for the 
step size p. 

However, since generally 

the instantaneous gradient estimate in (65) is -compared to (63)- additionally perturbed by 

£(yy"M-yk,^[k])w[fc), (73) 

for ft ^ oo. Hence, for ^ ^ oo, the update equation (65y(67) suffers from a larger residual excess error 
than (63). The additional excess error grows for decreasing increasing step size p and increasing vector 
length L.N of the vector y with L the filter length per channel and N the number of inputs to the adaptive 
filter. It is expected to be especially large for highly time-varying noise, e.g., multi-talker babble noise. 

4.1.2 NLMS based algorithm 

In [26], an LMS based implementation of the SDW-MWF has been proposed. Besides (64), some additional, 
independence assumptions are made. Applying these assumptions to (65)-(67), results in an LMS based 
implementation of the SP-SDW-MWF similar to [26]. Assuming that 



yf^y*h[Ml*-*\ =o (74) 
^(i-£)(y[%£/J*]+ywJ%*M) = o, (75) 

hold, with k and I different time instants, (65) can be simplified to 



w[fc + 1J = w[A] + x h W ^| + ^ W(^W - *"Mw[*]) 



where 



d[k] = y 0 [k - A] -7=; x[k] = y/l^ly[k] + yjy^ [*] 
V 1 " 



(76) 



(77) 
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during periods of noise only (i.e., y[k] = y n [fc)). During speech + noise (i.e., y[fe] = y*[k) + y n [k]) t d[k) 
and x[fc] in (76) are set to 

d[k) - y* Mh [k - A]^=;x[fc] = ^JT^y^k) + ^yM. (78) 
V 1 5 

Equations (74) and (75) assume that - besides speech and noise vectors - also noise vectors at diff erent 
time instants are mutually uncoirelated. In practice, (74) and (75) do not hold, especially for large yj 

and ^jl _ 1^ i.e. for /i-^l. Hence, compared to (65)-(67), performance is expected to be worse. 
In addition, equations (76)-(78) can - in contrast to (65)- not be applied for \i < 1. Compared to (65) no 
significant complexity reduction is achieved. The LMS based updating (76) requires 4JVL + 3 Multiply- 
Accumulate (MAC) per sample 10 , whereas update formula (65) requires (4NL + 5) MAC per sample. The 
computation of the normalized step size in (76) requires NL + 2 less MAC per sample than in (65). 

4.1.3 Frequency-domain implementation 

As stated before, the stochastic gradient algorithms (65)-(67) and (76) are expected to suffer from a large 
excess error for large ^ and/or highly time-varying noise, due to a large difference between the rank-one 
noise correlation matrices y n y n > H [k] measured at different time instants k. The gradient estimate can be 
improved by replacing 

y^y^/Jfcl-yy^W (79) 



in (65) with the time-average 



^ E y^fAAi)-^ E yy H M> 



(80) 

l=k-K+l " l=k-K+l 

where £ YlUk-K+iWnhybLf^l] is updated during periods of speech + noise and £ ZiU-jc+iyy^M 
during periods of noise only. However, this would require expensive matrix operations. A block-based 
implementation intrinsically performs this averaging: 



JTK-l 

wp + l)K] = w[fc*T] + £ |E y [kK + i) (yS [kK + z - A] — y H [kK + i]w[kK)) 

• — E (y^h + l kK +-4 - yl kK + fly* \ kK + fl) I • 

^ i=0 J 



(81) 



10 Notc that the output y 0 [fc — A] — w H y[k] of the algorithm still has to be computed. 
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The gradient and hence also y^u/iY^/i [*] — yy H [k] is averaged over K iterations prior to make adjustments 
to w. This goes at the expense of a reduced (i.e. by a factor K) convergence rate. 

The block-based implementation is computationally more efficient when it is implemented in the frequency- 
domain, especially for large filter lengths. In addition, in a frequence-domain implementation, each fre- 
quency bin gets its own step size, resulting in faster convergence compared to a time-domain implementa- 
tion while not degrading the time-domain MSE. Although the frequency and time-domain implementation 
obtain the same MSE, the improvement in SNRin te iiig» which is determined by the excess errors in each 
frequency bin, may be different In a time-domain implementation , one common step size p is used for the 
different frequency bins. The convergence rate depends on the eigenvalue spread of the correlation matrix of 
the input signals to the adaptive filter and hence on the power spectrum of the input signal. In frequency bins 
with little power this common step size will be smaller than in the frequency-domain approach, resulting in 
slower convergence and less excess error in that bin. In frequency bins with large power on the other hand, 
this common step size will be larger than in the frequency-domain approach, resulting in larger LMS ex- 
cess error in that frequency bin. Hence, in a time-domain implementation, the power spectrum of the input 
signals not only determines the convergence rate but also the improvement ASNRj nte uig. In a frequency- 
domain implementation, the step size is normalized in each frequency bin, so that the different bins have 
a similar convergence rate and hence also excess error. Hence, the SNR improvement in each frequency 
bin is more controlled (i.e. less dependent on the input power spectrum). Since signal model errors (e.g., 
microphone mismatch) modify the power spectrum of the noise references and hence, the convergence rate 
and improvement ASNRinteiiig of a time-domain implementation, frequency-domain implementations are 
more appropriate to evaluate the performance of the algorithms for different signal model errors. 

Algorithm 1 and Algorithm 2 summarize a frequency-domain implementation based on overlap-save 
of (65)-(67) and (76), respectively. Algorithm 1 requires (3iV + 4) FFTs of length 2L and algorithm 2 
(3N + 3) FFTs. By storing the FFT-transformed speech + noise and noise-only vectors in the buffers 11 
Bi e C iVxI * v /i and B 2 G <C NxL ^ t respectively, instead of storing the time-domain vectors, N FFT 
operations have been saved When adapting during speech + noise, also the time-domain vector 

[y 0 [kL-A] — y 0 [kL- A + L- 1] ] T (82) 

should then be stored in an additional buffer B 2 ,o € R lx ~^ i during periods of noise-only, which -for 
N = M- results in an additional storage of Lb ^ /2 words compared to when the time-domain vectors are 
stored into the buffers Bi and B 2 . 

Remark : In algorithm J and 2 a common trade-off parameter /u is used in all frequency bins. Alterna- 
tively, a different setting for fi can be used in different frequency bins. E.g. for SP-SDW-MWF with w 0 = 0, 
fx could be set to oo at those frequencies where the GSC is sufficiently robust, e.g., for small-sized arrays at 

"Since the input signals are real, half of the FFT components are complex-conjugated. Hence, in practice only half of the 
complex FFT components have to be stored in memory. 
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Algorithm 1 Frequency domain stochastic gradient SP-SDW-MWF based on overlap-save. 

Initialization: 

Wi(0]=[0 0 ] T , i = M -iV, M-l. 

p m (0]=<J m ,m = O, 22,-1 

Matrix definitions: 

6= [o^ 0*] ;k== [° II L F = 2L X2XDFT matrix 
For each new block of NL input samples: 

• If noise detected: 

1. F [ yi [kL - L] ... yi [kL + L - 1] ] T , t = M - N, M-l — noise buffer B 2 
[ yo[kL — A] ... yo [kL — A + L - 1] ] T — ► noise buffer Ba, 0 

2. Yi"[fc] = diag {F [ Vi [kL-L] ... yi [fcI, + Z,-l] ] T } , i = M - TV, .... Af - 1 

Y<[*|=diag{[ Bi(t,0) ... Bx(i,2I,-l) ] T } , t = Af-7V, Af-1 
cyclically shift each row i of Bi over 2L samples, t = Af — JV, Af — 1 
d(fc]=[ 0 ••• 0 Vo(fc^-A) •-• yo[kL- A + L-l ]] T 

• If speech detected: 

1. F[ y<[A:L-L) ... y»[fcZ/ + L-1] ] T , i = Af — TV, Af — 1 — » speech + noise buffer Bi 

2. Yi n [fc] = diag{[ B a (t,0) ... Bj(i, 21,-1) ] T } , t = Af - TV, Af - 1 
cyclically shift each row t of B 2 over 2L samples, i — M — N, M — 1 

Yi[*:| = diag {p [ Vi [kL - L\ ... V< [fcX + L - 1] ] r } , i = Af - JV, .... Af - 1 

d[fc] = [0 0 B 2 ,o(l,0) B 2i0 (l,£-1) ] T 

cyclically shift 62,0 over L samples 

• Update formula: 

1. e,[A] = kF" 1 EJIm-jnt Y?[fc]W;[fc| = y MM 
e(fc]=d[fc]-c 1 fW 

e,[*] = fcF- 1 E£m- W Yj[fc]W; [fc] = y^ 

E,[*] = Fk T ei[*|; Ej[fc) = Fk T e 3 [fcl; E[fc) = Fk T e[k] 

2. A(fc] = 2£dUg {P^ [A], .... /^-i [*]} 

/W*] = 7*»[* - 1] + (1 - 7) (E£m-~ |YJ m | a + i IeJIm'-n (|Y>.m| 3 - |Y,V|') |) 

3. Vft[k + 1) = W,[fc] + FgF~ l A[fcj {Yr[fc]E'(t] - i (V<EJ[fc] - Y?En*])} ,i = M-N, .... Af - 1 

• Output yo[k] = [ y 0 [kL — A] • • • !«[/«, — A + i — lj ] r 

— If noise detected y ou i[fc] = yo[fc] — y«a,i[fc] 

- If speech detected: y M [A:l = y 0 [fc) - y^affc] 



Algorithm 2 Frequency domain NLMS based SP-SDW-MWF based on overlap-save. 

Initialization: " * ~ 

Wi[0) = [ 0 ... 0 ] T , t = M- N, AT-1 
Pm[0] = 5 m , m = 0, .... 2L~1 

Matrix definitions: 

g= [ot 0^] ;k== [° II ]i F==2Lx 2LDFT matrix 
For each new block of NL input samples: 

• If noise detected: 

1. Ffi/i^X-L] ... yi[A:L4-X-ll ] T , t = Af - TV, Af - 1 — noise buffer B 2 
[ yo[kL - A] ... yo[kL — A + L - l) ] T — ► noise buffer B 3 , 0 

2. Yi[fc] = diag {f [ yi \kL - L] ... yi \kL + X - l] ] T } , t « Af - TV, Af - 1 

Y<, fctt/l [fcl = diag{[ Bi(t t O) ... Bi(i,2L-l) ] r } , i = Af - AT, .... M - 1 

Xi[fc] = yr^Y.lfc] + yjl^i^n (*] , i = Af - TV, .... Af - 1 

cyclically shift each row i of buffer Bi over 2L samples 

d[*l = [ 0 ... 0 yo [kL-A) y 0 [*L-A+I,-l ]] T 

• speech detected: 

1. F [ t/<[fcl, - L) ... + L - 1] ] r , i = Af - TV, Af - 1 speech + noise bufferBi 

2. Y 4 [fc) = diag |F [ yi [kL - LJ ... yi [kL + L - 1] ] r } , x = M - TV, Af - 1 
Y i|btt/a [fc] = diag{[ B 2 (i,0) B a (i,2X,-l) ] r },t = Af-TV, .... Af-1 

Xi[fc] = ^1 - £Y <ift „ /3 [fcl + ^/jYilfc], i » M - TV, .... Af - 1 
cyclically shift each row i of buffer B3 over 2Z» samples 
d W = ^ITT[ 0 0 B *.o(l,0) B 3 ,o(l,L-l) ] r 

cyclically shift Ba.o over L samples 

• Update formula: 

1 . E[fc] = Fk T (d - kF"> EJ1*#-at X, WW; [*]) 

2. A(fcj = ^diag{i> 0 - l [fc] I^LtN} 

/»«[*] = TiM* - 1] + (1 - 7) |X^| 3 ) 

3. Wi[Jfc +1J= W([it] + FgF-UlfclXiME-lfe], i = M -N M - 1 

• Output y 0 [fc] - kF"» Y«[*]W?[*] 

yo[*] = [ Va[kL - A] • •• V0 [kL-A + L-l]] T 



high frequencies. In that case, only a few frequency components ofY < should be stored in the speech + noise 
buffer. 
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4.2 Improvement of stochastic gradient algorithm 

To achieve a reliable estimate (80) of the average correlation matrix £{y s y 3 ' H } in highly time-varying 12 
noise scenarios (e.g. multi-talker babble), K should be much larger than LN. Hence, the averaging in 
the block-based 13 or frequency-domain implementation proposed in Section 4.1, does not suffice to obtain 
a good estimate for £{y 8 y 8 > H }. In this Section, we show that the performance of the stochastic gradient 
algorithm is improved by applying a low pass filter to the part of the gradient estimate that takes speech 
distortion into account, i.e., the term r[fc] in (65). The low pass filtering avoids a highly time-varying 
distortion of the desired speech component while not degrading the tracking performance needed in non- 
stationary noise scenarios. 

4.2.1 Concept 
Define w 5 as 14 

w 5 = w-w n (83) 
w 5 e Range{£{y*y*' H }} (84) 
£{y a V 3ji }^n = 0. (85) 

Then, the desired speech component ^[k] at the output equals 

^tfc]=yS[*-A]~wfy-[fe). (86) 

Assume that w a varies slowly in time. This is desired since a fast changing w a results in a highly time- 
varying distortion of the desired speech, and may thus harm sound quality. In addition, in hearing aid 
applications the average correlation matrix £{y s y 8 * H } is slowly time-varying as microphone characteristics, 
room acoustics and the average desired speaker position do not change quickly in time. Fast changes in the 
noise scenario can be tracked by the filter w n . This will be illustrated in Section 4.3. 
Then, 

£{y V' H }w[fc] = £{y*y 3 > H }v, a (87) 

can be approximated by 15 

^{y^y^/, - yy" }w s = £{ (y^yg fl - yy H ) w J, (88) 

l2 Like for the QR and GSVD based algorithms, we assume short-term stationarity of the second order statistics of the noise, so 
that £{y n y n,H } « E {y^ h }• ^ fest higher order statistics are allowed to vary faster in time. 
13 A large K > LN in block-LMS would result in a too slow convergence rate. 

14 € {y*y a,H } is rank deficient when the speech leakage y, in the noise references does not cover the whole frequency spectrum 
or when the number of inputs N to the adaptive filter exceeds 1 and the direct-to-reverberant ratio of the desired speech is high. 

"Just like for the matrix based algorithms, the noise correlation matrix £{y n y n,H } is assumed to be short-term stationary so 
that it can be estimated during periods of noise only. 



where y is a vector during noise only. Using the independence assumption [40] 

e{y n y n > H [k]w n [k}} - £{y n y n > H [k}}£ {w n [k}} 
and £{y n y n #} = £ {yfc^y^}, we find that 

£{(ybufiyiL fl - yy H ) w 0 } = £ {(y^y^ - yy") w[fc]}. 
Replacing the expectation value by tune averaging, £ {y a y°' H }w[k] can be estimated as 

^ £ (y^y£/iM-yy ff W)w[ii 



(89) 



(90) 



(91) 



during noise only 16 . The value K determines the convergence rate of the filter w a . 

Remark: In order to obtain a good estimate of £{y 8 y 3>H }, the long-term averaged noise correlation 
matrices £ T!iZl-K Y^** PI and -R Ya^h-k y^y^/a W should not differ too much from each other. This 
does not requires that the second order statistics of the noise source are stationary for about K time samples. 
It suffices that they are short-term stationary so that they can be estimated during noise only periods. 

The averaging operation (91) is performed by applying the foDowing low pass filter to the term r[k] = 

i (y^y^ - yy") w[fc] in (65): 



r[fc] = Xr[k - 1] + (1 - A)± {y* h y& h - yy") w[fc], 



(92) 



where A < 1. This corresponds to an averaging window K of about samples. The normalized step size 
p is modified into 



P = 



*Wfc) + y*y + <* 



r avg [k] = Ar avff [fc - 1] + (1 - A)i |y£ /iy6u/l - y"y| 



(93) 
(94) 



Compared to (65), (92) requires 3NL - 1 additional MAC and extra storage of a NL x 1 vector r. 



4.2.2 Frequency-domain 

Equation (92) can be extended to the frequency-domain. The update equation for Wi[k + 1] in algorithm 1 
then becomes: 

16 As also mentioned in Section 4.1, the noise-only vector y[fc] should be replaced by y^/a [fc] and the speech + noise vector 
y*>«/i 1*1 by y[k\ when adapting during periods of speech + noise. 
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W<[fc+1] = Witfcl+FgF^AtfeKYnfclE'M-Rilfc]); 

Ri[k) = \B4[k-l]+(l-\)±{Yi[k]EZ[k]-Y?[k]El[k)) 



(95) 



with 



E[fc] = P^jy^-kF- 1 Y, YiWWiWJ; < 96 ) 

M-l 

B x [k] = F^kF" 1 ^l fc ) w ;l fe ]; ( 97 > 

j=M-N 
M-l 

B 2 [k] = F^kF" 1 ]T Yj[k]WJ[Jb]. (98) 

j=M-N 



and p[fc] computed as follows: 



p(fc] = ^diag {PJ-Mfc], .... P 2 i 1 _ 1 [fe]} 
P m [k] = Pi, m [fc] + P 2 , m [A] 

Pl,m[k) = 7 Pl,mlA-l] + (l-7) £ l^ml 2 

j=M-N 

M-l 



P 2 ,m[fc] = AP 2 , ro [/c-l] + (l-A)i 



£ (iy^i 2 -!^! 2 ) 



Compared to algorithm 1, (95)-(98) requires one extra 2L— point FFT and 8NL — 2N — 2L extra MAC per 
L samples and additional memory storage of a 2NL x 1 real data vector. To obtain the same time constant 
in the averaging operation as in the time-domain version with K = 1, A should equal \ L . 

Experimental results in Section 4.3 will show that the performance of the stochastic gradient algorithm 
significantly improves by the low pass filter, especially for large A. 

4.2.3 Complexity of different stochastic gradient algorithms 

Table 1 summarizes the computational complexity (expressed as the number of real multiply-accumulate 17 
(MAC), divisions (D), square roots (Sq) and absolute values (Abs)) of the time-domain (TD) and frequency- 
domain (FD) Stochastic Gradient (SG) and NLMS based algorithms. Comparison is made with standard 
NLMS and the NLMS based SPA. We assume that one complex multiplication is equivalent to 4 real mul- 
tiplications and 2 real additions. A 2X-point FFT of a real input vector requires 2L log 2 2Z, real MAC 
17 counted as the number of multiply-accumulate, additions and multiplications. 
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(assuming radix-2 FFT algorithms). 

Table 1 indicates that the TD-SG without filter wq and the SPA are about twice as complex as the 
standard ANC. When applying a Low Pass filter (LP) to the regularization term, the TD-SG algorithm 
has about three times the complexity of the ANC. The increase in complexity of the frequency-domain 
implementations is less. 



Table 1: Computational complexity of TD and FD-NLMS and stochastic gradient algorithms (expressed as 
number of real MAC, divisions (D), absolute values (Abs) and square roots (Sq) per sample) 



Algorithm 


update formula 


adaptation of step size 


TD 


NLMS ANC 


(2Af -2)L+ 1)MAC 


1D+(M - 1)1, MAC 




NLMS based SPA 


(4(M - 1)L + 1) MAC+1 D+l Sq 


1 EH(M - 1)LMAC 




SG 


(4WL + 5)MAC 


ID +*1 Abs+(2iVX + 2) MAC 




NLMS based algorithm 


(47VX + 3) MAC 


1 D+7VL MAC 




SG with LP 


(7/VX + 4)MAC 


1 D+l abs+(2ATL + 4) MAC 


FD 


NLMS ANC 


(10M - 7 - ^ M ~ 1 )) + (6M - 2) log a 2L MAC 


1D+(2M + 2)MAC 




NLMS based SPA 


(UM - 11 - *i M ~ l ) + (6M - 2) log 2 2L MAC 
+l/Z,Sq+l/LD 


1D+(2M + 2)MAC 




SG 


(18JV + 6 - + (6N + 8)Iog 2 2LMAC 


1EH-Iabs+ (4iV + 4)MAC 




NLMS based algorithm 


(16JV + 4 - SgL) + (6N + 6) log 2 2L MAC 


10+(2JV + 2) MAC 




SG with LP 


(26W + 4 - + (6N + 10) log 2 2LMAC 


. 1 D+l Abs+(4JV + 6) MAC 



Remark: In Table 1 and Figure 8, the complexity of time-domain and frequency-domain NLMS ANC and 
NLMS based SPA represents the complexity when the adaptive filter is only updated during noise only. If 
the adaptive filter is also updated during speech + noise using data from a noise buffer, the time-domain im- 
plementations require NL additional MAC per sample and the frequency-domain implementations require 
2 additional FFT and (4L(M - 1) - 2(M - 1) + L) MAC per L samples. 

As an illustration, Figure 8 plots the complexity (expressed as the number of Mega operations per second 
(Mops)) of the time-domain and frequency-domain stochastic gradient algorithm with LP filtering as a 
function of L for M — 3 and a sampling frequency f 3 = 16 kHz. Comparison is made with the NLMS- 
based ANC of the GSC and the SPA. The complexity of the FD SPA is not depicted, since for small M, 
it is comparable to the cost of the FD-NLMS ANC. For L > 8, the frequency-domain implementations 
result in a significantly lower complexity compared to their time-domain equivalents. The computational 
cost of the FD stochastic gradient algorithm with LP is limited, making it a good alternative to the SPA for 
implementation in hearing aids. 

4.3 Experimental results 

In this Section, we evaluate the performance of the different FD stochastic gradient algorithms based on 
experimental results for a hearing aid application. Comparison is made with the FD-NLMS based SPA. For 
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a fair comparison, the FD-NLMS based SPA is -like the stochastic gradient algorithms- also adapted during 
speech + noise using data from a noise buffer. 

43.1 Set-up 

A three-microphone Behind-The-Ear (BTE) hearing aid with three omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an office room. The interspacing d between the first and 
the second microphone is about d = 1 cm and the interspacing between the second and third microphone 
about 1.5 cm. The reverberation time TeodB is about 700 ms for a speech weighted noise. The desired speech 
signal and the noise signals are uncorrelated. The desired speech source consists of sentences spoken by a 
male speaker. Both the speech and the noise signal have a level of 70 dB SPL at the center of the head. The 
desired speech source and noise sources are positioned at a distance of 1 meter from the head: the speech 
source in front of the head, the noise sources at an angle 9 w.r.t. the speech source. For evaluation purposes, 
the speech and noise signal have been recorded separately. 

The microphone signals are pre-whitened prior to processing to improve intelligibility [37], and the 
output is accordingly de-whitened. In the experiments, the microphones have been calibrated by means of 
recordings of an anechoic speech weighted noise signal positioned at 0° measured while the microphone 
array was mounted on the head. A delay-and-sum beamformer is used as a fixed beamformer, since -in case 
of small microphone interspacing - it is robust to model errors. The blocking matrix B pairwise subtracts 
the time aligned calibrated microphone signals. 

The performance of the FD stochastic gradient algorithms is evaluated for a filter length L = 32 taps per 
channel, pf = 0.8 and 7 = 0. To exclude the effect of the spatial pre-processor, the performance measures 
are calculated w.r.t the output of the fixed beamformer. The sensitivity of the algorithms against errors in 
the assumed signal model is illustrated for microphone mismatch, e.g., a gain mismatch T2 = 4 dB of the 
second microphone. Among the different possible signal model errors, especially microphone mismatch 
was found to be harmful to the performance of the GSC in a hearing aid application [17]. In hearing aids, 
microphones are rarely matched in gain and phase. In [3], gain and phase differences between microphone 
characteristics of up to 6 dB and 10°, respectively, have been reported. 

4.3.2 Comparison of different FD stochastic gradient techniques 

Figure 9(a) and (b) compare the performance of the different FD Stochastic Gradient (SG) SP-SDW-MWF 
algorithms without w 0 (i.e., the SDR-GSC) as a function of the trade-off parameter fj, for a stationary and 
non-stationary (e.g., multi-talker babble) noise source, respectively, at 90°. To analyze the impact of the 
approximation (64) on the performance, the result of a FD implementation of (63), which uses the clean 
speech, is depicted too. For both noise scenarios, the stochastic gradient algorithm significantly outperforms 
the NLMS based algorithm, especially for ^ -* .1. Without Low Pass (LP) filter, both algorithms achieve 
a worse improvement compared to (63), especially for large p. For a stationary speech-like noise source, 
the FD-SG algorithm does not suffer too much from approximation (64). In a highly time-varying noise 
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scenario, such as multi-talker babble, the limited averaging of r[fc] in the FD implementation does not 
suffice to maintain the large noise reduction achieved by (63). The loss in noise reduction performance 
could be reduced by decreasing the step-size p', at the expense of a reduced convergence speed. Applying 
the low pass filter (95) significantly improves the performance for all ^, while changes in the noise scenario 
can still be tracked. 

Figure 10 plots the improvement ASNRimaug and SD^ing of the SP-SDW-MWF (^ = 0.5) with and 
without filter wq for the babble noise scenario as a function of where A is the exponential weighting 
factor of the LP filter (see (95)). Performance clearly improves for increasing A. For small A, the SP- 
SDW-MWF with wo suffers from a larger excess error -and hence worse ASNRimeiug- compared to the 
SP-SDW-MWF without w 0 . This is due to the larger dimensions of £{y*y* ,if }• 

The LP filter avoids that the desired speech is distorted by a highly time- varying filter w 5 . In contrast 
to a decrease in step size p', the LP filter does not compromise tracking of changes in the noise scenario. 
As an illustration, Figure 1 1 plots the convergence behavior of the FD stochastic gradient algorithm without 
wq (i.e., the SDR-GSC) for A = 0 and A = 0.9998, respectively, when the noise source position suddenly 
changes from 90° to 180° . A gain mismatch T2 of 4 dB was applied to the second microphone. To avoid fast 
fluctuations in the residual noise energy e£ and speech distortion energy £4, the desired and interfering noise 
source in this experiment are stationary, speech-like. The upper figure depicts the residual noise energy e 2 
as a function of the number of input samples, the lower figure plots the residual speech distortion e\ during 
speech + noise periods as a function of the number of speech + noise samples. Both algorithms (i.e., A = 0 
and A = 0.9998) have about the same convergence tate. When the change in position occurs, the algorithm 
with A = 0.9998 even converges faster. For A = 0, the approximation error (64) remains large for a while 
since the noise vectors in the buffer are not up to date. For A = 0.9998, the impact of the instantaneous large 
approximation error is reduced thanks to the low pass filter. 

43.3 Comparison with SPA 

Figure 12 and Figure 13 compare the performance of the FD stochastic gradient algorithm with LP filter 
(A = 0.9998) and the FD-NLMS based SPA in a multiple noise source scenario. The noise scenario consists 
of 5 multi-talker babble noise sources positioned at angles 75°, 120°, 180°, 240°, 285° w.r.t. the desired 
source at 0°. To assess the sensitivity of the algorithms against errors in the assumed signal model, the" 
influence of microphone mismatch, e.g., gain mismatch T2 = 4 dB of the second microphone, on the 
performance is depicted too. In Figure 12, the improvement ASNRinteiug and the distortion SDjuteUig of the 
SP-SDW-MWF with and without filter w 0 is depicted as a function of the trade off factor ^. Figure 13 
shows the results of the QIC-GSC 

w*w < P 2 (99) 

for different constraint values 0 2 t which is implemented using the FD-NLMS based SPA. 

Both, the SPA and the stochastic gradient based SP-SDW-MWF increase the robustness of the GSC 
(i.e., the SP-SDW-MWF without w 0 and ^ = 0). For a given maximum allowable distortion SD^ing, 
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the SP-SDW-MWF with and without wq achieve a better noise reduction performance than the SPA. The 
performance of the SP-SDW-MWF with w 0 is -in contrast to the SP-SDW-MWF without w 0 - not affected 
by microphone mismatch. In the absence of model errors, the SP-SDW-MWF with w 0 achieves a slightly 
worse performance than the SP-SDW-MWF without wo. With wo, the estimate of ~£ {y 3 y s,H } is less 
accurate due to the larger dimensions of ^£ {y 8 y 8,H } (see also Figure 10). 

In short, the proposed stochastic gradient implementation of the SP-SDW-MWF preserves the benefit of 
the SP-SDW-MWF over the QIC-GSC. 

4.4 Conclusions 

In this paper, we derived time-domain and frequency-domain stochastic gradient algorithms for the SP- 
SDW-MWF and compared their performance to the SPA. Starting from the cost function of the SP-SDW- 
MWF, a time-domain stochastic gradient algorithm has been derived in Section 4.1. In addition, the LMS 
based algorithm [26] has been extended so that it applies to the SP-SDW-MWF. To increase convergence 
and reduce complexity, a frequency-domain implementation has been proposed. Both, the stochastic gra- 
dient and LMS based algorithm suffer from a large excess error when applied in highly time-varying noise 
scenarios. In Section 4.2, we show that the excess error is reduced by applying a low pass filter to the part of 
the gradient estimate that limits speech distortion. The low pass filtering avoids a highly time-varying distor- 
tion of the desired speech component while not degrading the tracking performance needed in time-varying 
noise scenarios. Section 4.3 compares the performance of the different frequency-domain stochastic gradi- 
ent algorithms for a hearing aid application. The stochastic gradient SP-SDW-MWF outperforms the LMS 
based algorithm, while complexity is not increased: For a non-stationary noise scenario, the LMS based and 
stochastic gradient SP-SDW-MWF suffer from a reasonably large excess error. Experimental results show 
that the low pass filtering significantly improves the performance of the stochastic gradient algorithm and 
does not compromise the tracking of changes in the noise scenario. In addition, experiments demonstrate 
that the proposed stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over QIC-GSC. 
The limited computational cost and the better noise reduction perfonnance of the proposed algorithm make 
it a good alternative to the SPA for implementation in hearing aids. 
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A Efficient frequency-domain implementation using correlation matrices 

In Section 4 stochastic gradient algorithms in the time-domain and in the frequency-domain have been 
developed for implementing the Speech Distortion Weighted Multi-channel Wiener Filter (SDW-MWF). 
These algorithms however require large data buffers for calculating the regularisation term required in the 
filter update formulas, resulting in a large memory usage. In this addendum it is shown that by approximating 
this regularisation term in the frequency-domain, (diagonal) speech and noise correlation matrices need 
to be stored instead of data buffers, such that the memory usage is decreased drastically, while also the 
computational complexity is further reduced. Experimental results demonstrate that this approximation 
results in a small - positive or negative - performance difference, such that the proposed algorithm preserves 
the robustness benefit of the SP-SDW-MWF over the QIC-GSC, while both its computational complexity 
and memory usage are now comparable to the NLMS-based SPA for implementing the QIC-GSC. 

A.l Stochastic Gradient algorithms 

In this section we first briefly review the stochastic gradient algorithm in the time-domain (cf. Section 
4.1.1) and the calculation of the regularisation term r[fc] (cf. Sections 4.1.1 and 4.2.1). We then show 
that by approximating this regularisation term in the frequency-domain the memory usage can be reduced 
drastically. 

A.L1 Time-Domain implementation 

In Section 4.1.1 a stochastic gradient algorithm in the time-domain has been developed for minimising the 
cost Junction in (42), i.e. 

w[fc + l] = ^[k} + p{y n [k}(y^[k-^]-y n ' H [k}^[k])-r[k]} (109) 
r[fc] = V[% a, "[fc]w[fc] (HO) 

= , (in) 

9 y»> H [k]y»[k) + ±y*> H [k]y*[k] + 6 

with p the normalised step size of the adaptive algorithm, 5 a small positive constant, and w[fc], y n [fc], y*[fc] 
and r[k] ATL-dimensional vectors. For 1/^ = 0 and no filter w 0 present, (109) reduces to an NLMS-type 
update formula often used in GSC, and operated during noise-only-periods [7, 10, 13, 42]. For ^ 0, 
the additional regularisation term r[k] limits speech distortion due to possible signal model errors. 

In order to compute (1 10), knowledge about the (instantaneous) correlation matrix y a [k]y 8 ' H [k] of the 
clean speech signal is required, which is obviously not available. In order to avoid the need for calibration, 
it is suggested in Section 4. 1 . 1 to store L-dimensional speech+noise-vectors [k] , i = M - .AT . . . M — 1, 
during speech-periods in a circular speech+noise-bufferBi € R" xi Wi and to adapt the filter w[fc] using 
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(109) during noise-only-periods 10 based on approximating the regularisation term in (1 10) by 

• r[k] = i (y^Wy^Jfc] -y n [k\y n >"[k]) w[fc] , (112) 

with y^/j [k] a vector from the circular speech+noise-buffer Bi, cf. (72). However, as has been indicated 
in Section 4.1.1, this estimate of r[A;] is quite poor, resulting in a large excess error, especially for small p. 
and large p 1 . Hence, it has been suggested to use an estimate of the average clean speech correlation matrix 
£{y 8 [k]y** H [k]} in (1 10), such that r[k] can be computed as 

r[ *] = I(i - x) ^2 \ k ~ l ( ybufl [i)Y& h [i] - y n [i)y n * H li}) • w[fc] ; (113) 

^ 1=0 

with A an exponential weighting factor and the step size p in (1 1 1) now equal to 

= i 

9 y">*[fc]y»[A;i + £(i - a) £? =0 \y& fl WywiW - y n ' H (Wl| + * 

For stationary noise a small A, i.e. 1/(1 - A) ~ NL, suffices. However, in practice the speech and the 
noise signals are often spectrally highly non-stationary (e.g. multi-talker babble noise), whereas their long- 
term spectral and spatial characteristics usually vary more slowly in time. Spectrally highly non-stationary 
noise can still be spatially suppressed by using an estimate of the long-term correlation matrix in r[fc], i.e. 
1/(1 — A) » NL. 

In order to avoid expensive matrix operations for computing (113), it is assumed in Section 4.2.1 that 
w(fc] varies slowly in time, i.e. w[fc] » w[ij, such that (1 13) can be approximated with vector instead of 
matrix operations by directly applying a low-pass filter to the regularisation term r[fc], cf. (100), 

r[k] = A(l-A)]£A*- l (y^ (114) 
1=0 

= Xr[k - 1] + (1 - A)i ( y&tt/l [k)yg fl [k] - y n [k]y n > H [fc]) w[*] . (115) 

However, as will be shown in the next paragraph, this assumption is actually not required in a frequency- 
domain implementation. 

A.1.2 Efficient Frequency-Domain implementation 

In Section 4.2.2 the (improved) stochastic gradient algorithm in the time-domain has been converted to 
a frequency-domain implementation by using a block-formulation and overlap-save procedures (similar 

"In Section 4.1 .1 it has been shown that storing noise-only-vectors y%[k] = y?[&], t = 0 . . . M — 1 during noise-only-periods 
inacircular noise-buffer B 3 eR MxLbu '* additionally allows adaptation during speech-periods. 
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to standard adaptive filtering techniques in the frequency-domain [43]). However, the frequency-domain 
algorithm described in Section 4.2.2 (Algorithm 3) requires many data buffers and hence the storage of a 
large amount of data 21 . A substantial memory (and computational complexity) reduction can be achieved 
by the following two steps: 

• When using (113) instead of (1 15) for calculating the regularisation term, correlation matrices instead 
of data samples need to be stored. The frequency-domain implementation of the resulting algorithm 
is then summarised in Algorithm 4, where 2L x 2L-dimensional speech and noise correlation ma- 
trices Sij[k) and S§[Jb], t, j = M - N ... M — 1 are used for calculating the regularisation term 
Ri[fc] and (part of) the step size A[fc]. These correlation matrices are updated respectively during 
speech-periods and noise-only-periods 22 . However, this first step does not necessarily reduce the 
memory usage (NL^ for data buffers vs. 2(NL) 2 for correlation matrices) and will even increase 
the computational complexity, since the correlation matrices are not diagonal. 

• The correlation matrices in the frequency-domain can be approximated by diagonal matrices, since 
F^kF*" 1 in Algorithm 4 can be well approximated by I2l/2 [44; 45]. Hence, the speech and the 
noise correlation matrices are updated as 

Si#] = AS«[*-11 + (l-A)Yf[fcpfi!*]/a. ( 116 > 
S£[*] = AS^[fc~l] + (l-A)Y^[fc]Y7[A:]/2, (117) 

leading to a significant reduction in memory usage and computational complexity, cf. Section A.2, 
while having a minimal impact on the performance and the robustness, cf. Section A.3. We will refer 
to this algorithm as Algorithm 5. Algorithm 5 is in fact quite similar to the algorithm presented in 
[46], which is derived directly from a frequency-domain cost function. Some major differences how- 
ever exist, e.g. in [46] the regularisation term Ri[fc] is absent, the term FgF~* is also approximated 
by I2l/2 and the speech and the noise correlation matrices are block-diagonal. 

A.2 Memory usage and computational complexity 

Table 2 summarises the computational complexity and the memory usage of the frequency-domain NLMS- 
based SPA for implementing the QIC-GSC [14] 23 and the frequehcy-domain stochastic-gradient algorithms 
for implementing the SDW-MWF (Algorithm 3 and Algorithm 5). As in Section 4.2.3, the computational 
complexity is expressed as the number of operations per second (MIPS), while the memory usage is ex- 
pressed in kWords. We assume that one complex multiplication is equivalent to 4 real multiplications and 2 

21 In order to achieve a good performance, typical values for the buffer lengths L^ x and L^/^ of the circular buffers Bi and 
B 2 are 10000... 20000. 

"When using correlation matrices, filter adaptation can only take place during noise-only-periods, since during speech-periods 
the desired signal d[k] cannot be constructed from the noise-buffer B 2 any more. 

23 The computational complexity of the frequency-domain QIC-GSC using SPA also represents the complexity when the adaptive 
filter is only updated during noise-only-periods. 
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Algorithm 4 Frequency-domain implementation with correlation matrices (without approximation) 

Initialisation and matrix definitions: 

Wi[0]=[0 0 ] r ,i = M — AT...M — 1 

Pm[0] - 5 m , m = 0 . . . 2L - 1 

F = 2L x 2L-dimensional DFT matrix 

.«-[£ £]■ k -f 0t 

Ox, = L x L-dimensional matrix with zeros, Ix, = L x ^-dimensional identity matrix 

For each new block of L samples (per channel): 

d[k)= [ y 0 [kL- A] ... 2/o^-A + I-l] f 

Y<[fc] - diag {F [ yi [kL^L) + ] T } , i = M - JV . . . M - 1 

Output signal; 

Af-1 

e[fe] = d[fc]-kF- 1 5] Y il fc ] W iM> E[fc] = Fk T e[fc] 
If speech detected: 

Sn[k] = (1 - A) £ A*"" l Yf r [i]Pk T kP'" l Yjp] - XS^k - 1] + (1 - A) Y? [k] Fk r kF~ 1 Yj [k] 

1=0 

If noise detected: Yi[k) = Y?[k) 
k 

S%[k] = (1 - A) 53 A*- | Y?^[l]Fk r kF- 1 YJPJ = AS$[fc - 1] + (1 - A)Yj hJr [fc]Fk r kF- l Yy[fc] 

1=0 

Update formula (only during noise-only-periods): 
1 M-l 

Ri[*] = £ E [Si#]-S5[fe]]W#],i==M-iST...M-l 

^ j—M—N 

Wi[fc + 1] = W4fc] + FgF^ 1 A[*J {Y t ^ H [fc]E[fc] - Ri[fc]}, i = M - AT . M - 1 _ 



with 



A[k] = ^ diag {PoHk), Prf.M 

Pm[k) = lPm[k - 1] + (1 - 7 ) {Pl,mW + &,m[Q)> m = 0 . . . 2L - 1 
M-l . M-l 

*.-[*] = £ few I 2 . ft,mlfc] = - £ 

j=M-N M j=M-N 



, m = 0 . . . 2L - 1 
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Algorithm 


Computational complexity 


JVLLra 




update formula 


adaptation of step size 




NLMS based SPA 


(14Af - 11 - W- 1 )) + (6M - 2) iog 2 21/ MAC 


(2M4-2) MAC 


2.16 


(QIC-GSC) 


-fl/^Sq+l/LD 


+1D 




ou Willi i^r 


(267V + 4 - i^) + (67ST + 10) log 2 2L MAC 


(4JV -f 6) MAC 


q oo(») 4 97(b) 


(Algorithm 3) 




+lD + lAbs 




SG with LP 


(ION 2 + 13N - 4Ar Y^) + ( 6JV + 4 ) lo Sa 2LMAC 


(2iV + 4)MAC 


2.71 W B 4.31 W 


(Algorithm 5) 




+lD+lAbs 






Memory usage 


kWords 


QIC-GSC 


4(Af-l)Z, + 6L 


0.45 


Algorithm 3 


2NLbuf 1 + 6L// + 7L 


40.61°°, 60.80 (b> 


Algorithm 5 


4LiV a +6LiV4-7L 


1.9rf» 



Table 2: Computational complexity and memory usage for M = 3, L = 32, / 5 — 16 kHz, = 10000, 
(a) N = M - 1, (b) N = M 



real additions and that a 2L-point FFT of a real input vector requires 2L log 2 2L real MACs (assuming the 
radix-2 FFT algorithm). From this table we can draw the following conclusions: 

• The computational complexity of the SDW-MWF (Algorithm 3) with filter w 0 is about twice the 
complexity of the QIC-GSC (and even less if the filter wq is not present). The approximation of the 
regularisation term in Algorithm 5 further reduces the computational complexity. However, this only 
remains true for a small number of input channels, since the approximation introduces a quadratic 
termO(iV 2 ). 

• Due to the storage of the data samples used in the circular speech+noise-buffer Bi, the memory usage 
of the SDW-MWF (Algorithm 3) is quite high in comparison with the QIC-GSC (depending on the 
size of the data buffer Lem/i of course). By using the approximation of the regularisation term in 
Algorithm 5, the memory usage can be reduced drastically, since now diagonal correlation matrices 
instead of data buffers need to be stored. Note however that also for the memory usage a quadratic 
term 0(N 2 ) is present. 

A3 Experimental results 

In this paragraph it is shown that practically no performance difference exists between Algorithm 3 and Al- 
gorithm 5, such that the SDW-MWF using the implementation proposed in this addendum indeed preserves 
its robustness benefit over the GSC (and the QIC-GSC). 
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A3.1 Set-up 

The same set-up has been used as in Section 4.3. 1. A 3-microphone BTE hearing aid with 3 omnidirectional 
microphones (Knowles FG-3452) has been mounted on a dummy head in an office room. The interspacing d 
between the first and the second microphone is about d = 1 cm and the interspacing between the second and 
the third microphone is about 1.5 cm. The reverberation time TeodB is about 700 ms for a speech weighted 
noise. The desired speech source and the noise sources are positioned at a distance of 1 m from the head. 
The desired speech source is positioned in front of the head (at 0°) and consists of English sentences. The 
noise scenario consists of five multi-talker babble noise sources, positioned at 75°, 120°, 180°, 240° and 
285°. The desired signal and the total noise signal both have a level of 70 dB SPL at the centre of the head. 
For evaluation purposes, the speech and the noise signals have been recorded separately. 

The microphone signals are pre-whitened prior to processing to improve intelligibility [38], and the 
output is accordingly de-whitened. In the experiments, the microphones have been calibrated by means 
of recordings of an anechoic speech weighted noise signal positioned at 0° measured while the BTE was 
mounted on the head A delay-and-sum beamformer is used as the fixed beamformer and the blocking 
matrix pairwise subtracts the time-aligned calibrated microphone signals. 

The performance of the stochastic gradient algorithms in the frequency-domain is evaluated for a filter 
length L = 32 per channel, p' = 0.8, 7 = 0.95 and A = 0.9998. For all considered algorithms, filter 
adaptation only takes place during noise-only periods. To exclude the effect of the spatial pre-processor, the 
performance measures are calculated with respect to the output of the fixed beamformer. The sensitivity of 
the algorithms against errors in the assumed signal model is illustrated for microphone mismatch, i.e. a gain 
mismatch T2 = 4 dB at the second microphone. 

A3.2 Experimental results 

Figures 14 and 15 depict the SNR improvement ASNRujtcmg and the speech distortion SDintcUig of the SP- 
SDW-MWF (with w 0 ) and the SDR-GSC (without w 0 ), implemented using Algorithm 3 (solid line) and 
Algorithm 5 (dashed line), as a function of the trade-off parameter These figures also depict the effect 
of a gain mismatch T2 = 4 dB at the second microphone. From these figures it can be observed that 
approximating the regularisation term only results in a small performance difference. For most scenarios 
the performance is even better (i.e. larger SNR improvement and smaller speech distortion) for Algorithm 
5 than for Algorithm 3, probably since in Algorithm 3 the additional assumption is used that the filter w[fc] 
varies slowly in time, cf. (115). 

Hence, also when implementing the SDW-MWF using the proposed Algorithm 5, it still preserves its 
robustness benefit over the GSC (and the QIC-GSC). E.g. it can 6e observed that the GSC (i.e. SDR-GSC 
with 1/^ = 0) will result in a large speech distortion (and a smaller SNR improvement) when microphone 
mismatch occurs. Both the SDR-GSC and the SP-SDW-MWF add robustness to the GSC, i.e. the dis- 
tortion decreases for increasing The performance of the SP-SDW-MWF is again hardly affected by 
microphone mismatch. 
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A.4 Conclusion 

In this addendum we have shown that the memory usage (and the computational complexity) of the SDW- 
MWF can be reduced drastically by approximating the regularisation term in the frequency-domain, i.e. by 
computing the regularisation term using (diagonal) frequency-domain correlation matrices instead of time- 
domain data buffers. It has been shown that approximating the regularisation term only results in a small 
performance difference, such that the robustness benefit of the SDW-MWF is preserved, while now both the 
computational complexity and the memory usage are comparable to the NLMS-based SPA for implementing 
theQIC-GSC. 
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Figure 14: SNR improvement of frequency-domain SP-SDW-MWF (Algorithm 3 and Algorithm 5) in 
multiple noise source scenario 
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Figure 15: Speech distortion of frequency-domain SP-SDW-MWF (Algorithm 3 and Algorithm 5) in a 
multiple noise source scenario 
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Figure 1: Concept of the Generalized Sidelobe Canceller. 
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Figure 2: Equivalent approach of multi-channel Wiener filtering. 
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Figure 3: Spatially Pre-processed SDW MWF. 
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Figure 4: Decomposition of SP-SDW-MWF with w 0 in a multi-channel filter w<j and single-channel post- 
filter ei — wq. 
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Figure 5: Influence of 1//* on the performance of the SDR GSC for different gain mismatches T2 at the 
second microphone. 
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Figure 6: Influence of l/fj, on the performance of the SP SDW MWF with w 0 for different gain mismatches 
T2 at the second microphone. 
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Figure 7: ASNRinteiugand SDinteiiig for QIC-GSC as a function of 0 2 for different gain mismatches T2 at the 
second microphone. 
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Figure 8: Complexity (expressed in Mops) of TD and FD Stochastic Gradient (SG) algorithm with LP 
filtering as a function of filter length L per channel; M = 3. For comparison, the complexity of the standard 
NLMS ANC and SPA are depicted too. 
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Figure 9: Performance of different FD Stochastic Gradient (FD-SG) algorithms; (a) Stationary speech-like 
noise at 90° ; (b) Multi-talker babble noise at 90°. 
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Figure 10: Influence of LP filter on performance of FD stochastic gradient SP-SDW-MWF 
without wo and with wq. Babble noise at 90°. 
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Figure 11: Convergence behavior of FD-SG for A = 0 and A = 0.9998. The noise source position suddenly 
changes from 90° to 180° and vice versa. 
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Figure 12: Performance of FD stochastic gradient implementation of SP-SDW-MWF with LP (A = 0.9998) 
in a multiple noise source scenario. 



7/7 




Figure 13: Performance of FD SPA in a multiple noise source scenario. 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 



Defects/in the images include but are not limited to the items checked: 



□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




BLACK BORDERS 



